1 Introduction

Perceiving is, at least sometimes, something we do. Consider cases of looking, listening, and sniffing, rather than merely seeing, hearing, and smelling. In these cases, there is simultaneously something passive and active about perceptual experience. How best to reconcile these features of perception has been called the ‘puzzle of perceptual agency’ (Watzl, 2017; O’Shaughnessy, 2003).

Accepting that there is such a thing as perceptual agency requires giving up on a familiar story about the mind according to which perception is on the purely receptive end of our mental lives. According to this story, perception supplies us with mere input, ready for further processing, eventually leading to the output of an action. Hurley (1998a) dubs this standard view the ‘classical sandwich model’ of the mind: cognition is the ‘filling’, sandwiched on either side by perception and action (ibid. 401).

Rejecting the classical sandwich model does not automatically entail the kind of embodied cognition that Hurley advocates. There are multiple approaches to thinking about perceptual agency, and which approach one takes is indicative of one’s broader positioning in the philosophy of mind. Naomi Eilan suggests that we can distinguish between two broad approaches to thinking about perceptual agency (Eilan, 2006, pp. 85–86).Footnote 1 On the one hand, there are those like J. J. Gibson who emphasize the role of bodily action in perception. For Gibson (1976, 1979), perception is active because it involves moving around in the environment, picking up observer-relative information. On this view, perception is to be analyzed in terms of affordances (i.e., possibilities for action). More recent work in this direction emphasizes the role of sensorimotor know-how in bringing about – or ‘enacting’ – perceptual experience (Noë, 2004). To see not just a picture-like snapshot of a tomato, but the tomato as a voluminous object with unseen sides, requires the possession, and maybe even the exercise, of sensorimotor skills.

On the other hand, there are those who approach the issue of perceptual agency by emphasizing the role of mental action in perception, especially the role played by attention. Consider intentionally focusing your visual attention on some aspect of your environment. According to a widespread view, this is an example of a mental action. The central idea behind the second approach to the puzzle of perceptual agency then is to appeal to a form of attention-based mental agency. Eilan names William James as an early representative of this approach; Sebastian Watzl is a contemporary defender of the view.

Eilan’s sorting presupposes a coherent distinction between ‘mental action’ and ‘bodily action’. But how should we understand this distinction? On some accounts mental and bodily action are mutually exclusive categories. In a debate with Richards (1976), Gibson (1976) conceives of mental activity as “operations of the mind upon the deliverances of the senses” (p.234, emphasis added), while bodily activities involved in perception occur “before sensations have been aroused by stimuli” (p.235, emphasis added). Gibson’s distinction reflects a familiar conception of the mind in which the mental/bodily distinction is aligned with the inner/outer distinction. In Richard’s terms: “the data [of the senses] are acted upon by the mind or brain so as to provide elements of structure that are not antecedently present” (219). The mental is then a kind of ‘inner realm’, and mental action a kind of ‘inner action’ operating upon inner mental elements (e.g., representations).

The alignment of the mental with an inner realm has been criticized extensively in the literature on embodied cognition (Dreyfus & Taylor, 2015, pp. 11–12; Hurley, 1998a). Dreyfus and Taylor (2015) see this mutually exclusive sorting as an expression of what they call a “mediational” worldview. This worldview sees the contact between mind and world as mediated by intermediaries (such as mental representations). Rejecting the mediational worldview implies a rejection of a neat sorting between mind and body: “What we think of as mind and body interpenetrate” (Dreyfus & Taylor, 2015, 12). Embodied cognition is therefore an attempt to overcome the opposition between mind and body, and to rethink the mind as embodied.

In the literature on mental action, an understanding of mental action as ‘inner action’ has likewise been criticized on the grounds that canonical mental actions such as thinking, calculating and counting do not always respect the inner/outer boundary: they can involve bodily activity (Levy, 2019; Melser, 2004; Peacocke, 2021; Soteriou, 2009). Rejecting the mental-action-as-inner-action picture, it is far from obvious that bodily action and mental action are mutually exclusive categories. Recent philosophy of action disagrees sharply over the nature, scope and demarcation of mental action.Footnote 2 For example, Soteriou (2009) proceeds rather cautiously and states that “at least a variety” (233) of mental action can be demarcated by considering actions that “do not seem to require for their successful performance the performance of any overt bodily action” (233).

This development in the mental action literature points to a possibility we would like to explore in this paper, namely, that perceptual attention is not an inner mental action, but an embodied mental action: a mental action that is, in part, constituted by bodily activity. In line with the mental action route, perception is active because it involves a form of attention-based mental agency. But, in line with enactivists and embodied cognition theorists, mental action itself can be constituted by bodily movement.

In considering this question, it is important to distinguish between two metaphysical relations that are at stake in the literature: we need to distinguish between whether perceptual attention essentially involves bodily activity and whether perceptual attention can be constituted (in part) by bodily activity. The first question seeks the minimal supervenience base of a type of phenomenon (in Block’s terms: the metaphysically necessary part of a metaphysically sufficient condition” (Block, 2005, p. 264, emphasis in original), while the second is after the constitution of a particular phenomenon or subtype. In what follows we focus on the latter. We spell out the distinction in more detail in Sect. 2.

We develop our argument by engaging with what we take to be the most sophisticated articulation of the mental activity view of perceptual agency, namely, Sebastian Watzl’s Structuring Mind (2017). Watzl positions the mental activity route explicitly in opposition to the embodiment route:

The activity view of attention helps to account for many intuitions that motivate enactive views of perception and takes on board much of what these views say without commitment to their claims about embodiment. The activity view locates the active element in perception in a mental activity, attending, which infuses perceptual experience with agency. (45 − 6)Footnote 3

In this paper, we want to challenge the idea that the mental activity view can accommodate enactivist intuitions without taking on board their commitments about embodiment. On the contrary, in order to capture the intuitions that motivate enactivism, Watzl needs to commit to the idea that perceptual attention involves what we call embodied mental action: mental action that is (in part) constituted by bodily activity. Our analysis denies the forced choice between either the bodily or mental activity and opens up the possibility for a third route to solving the puzzle of perceptual agency: perception is active because it is infused with embodied mental action.

The paper is organized in the following way. First, we briefly introduce Watzl’s account of attention and situate his position vis-à-vis enactivism. In the second section, we investigate the role of the body in overt attention. Here we argue that overtly perceptually attending to something is a mental action that constitutively involves the body. In the third section, we will investigate the role of the body in covert perceptual attention. Contra Watzl, we argue that the standard cases of covert attention (i.e., the Posner paradigm) do implicate the body in multiple ways.

2 Watzl on attention as structuring mind

According to Watzl (2017), although we have rich folk-psychological sense of what attention is and the role it plays in our lives, and although attention has been major topic in the empirical mind and brain sciences, we lack a satisfying and unifying philosophical account of the phenomenon. Watzl aims to provide just such an account. On the view he develops, attention is a holistic phenomenon that cuts across many of the standard divisions in the philosophy of mind and instead concerns how the elements of mind hang together (cf. Mole, 2011). On his view, attention is something we do. More specifically, it is the personal-level activity of organizing or structuring the mind. To develop the view, Watzl introduces the technical notion of a ‘priority structure’, which is a system of occurrent, personal-level psychological phenomena (states, events, and processes, and their parts) organized by a relation of relative priority (Watzl, 2017: § 4). When the psychological phenomena in question are conscious, they form what Watzl calls a ‘centrality system ’ (ibid. § 9).Footnote 4 The basic idea is that conscious experience exhibits an organization: roughly, some parts of the experience are more central, whereas other form a periphery. A simple case of this is foreground-background structure in perceptual experience. As Watzl points out, this organization is constantly undergoing dynamic changes – now this is central and now that. Attention is the ongoing activity of regulating – creating, changing, and maintaining – priority structures.

Consider the example of going to see a jazz concert. Your auditory attention might first be directed towards the sound of the trumpet, then the drums, then you might briefly visually attend to the audience around you, to see if they are enjoying the show, then your attention might wander to thoughts about what you will have for dinner, etc. Watzl thinks we should account for these changes not purely in terms of differences in appearances (i.e., the objects or contents of consciousness) but also in terms of the structure of experience. When I attend to the trumpet, for example, I make the trumpet central vis-à-vis the drums, and this is a structural relationship pertaining between parts of the phenomenal field.

Importantly, this is all something that we as subjects do. Attention is an activity we are constantly engaged in. According to Watzl, the activity of attending can either be actively guided by a complex hierarchy of goals, action plans, and intentions, which are said to make up the subject’s ‘executive control system’, as in the case of so-called top-down endogenous attention shifts (ibid. 138 − 49), or it can be passively guided by the salience (understood in terms of imperatival content) of personal-level phenomena that are not part of the executive control system (e.g., perceptions, emotions etc.), as in the case of so-called bottom-up exogenous attention shifts (e.g., attention capture) (ibid. 114 − 35). The former case is seen by Watzl as a paradigm example of a full-blooded voluntary, intentional mental action (ibid. 139). However, even in the latter case he thinks the subject is doing something: just as tapping one’s foot absent-mindedly to the sound of music in the next room is an example of a (subintentional) activity of a person (cf. O’Shaughnessy, 1981), so too is having one’s attention drawn or held by something. In such cases, the relevant activity is not guided by the subject’s intentions but is rather guided directly by, e.g., perceptions of the environment (Watzl, 2017, p. 115) (cf. Watzl, 2014): it is the subject-level state of hearing a loud bang that guides the bystander to attend to the car crash.

To summarize: a priority structures consists of multiple psychological elements and a priority relation that holds between them. Attending is the activity of regulating the evolution of such priority structures.

Let us now reconstrue Watzl’s commitments about the involvement of bodily activity in perceptual attention, and how he contrasts his position with enactivism. As noted above, it is important to distinguish between the claim that perceptual attention is essentially embodied and the claim that perceptual attention can be constituted (in part) by bodily activity. The first is a view about the minimal supervenience base of perceptual attention, the second is about the realizers of a particular instance or a subtype of perceptual attention. Settling on the minimal supervenience basis of a phenomenon does not determine which factors are constitutive of it and which are causally impinging on it. To draw an analogy: one can easily deny that cardamom is essential to what makes a cake a cake; it is after all entirely possible to make a cake without cardamom. But once you have added cardamom to the dough, cardamom does not just causally impinge on the essential cake but becomes a constitutive part of the particular cake you are making (cf. de Haan, 2020). Analogously, it is possible to accept that bodily activity is not essential for perceptual agency, but still accept that when the body is involved, it is constitutively involved. In other words, the claim that bodily activity is not essential for perceptual agency does not amount to the claim that bodily activity can never be a constitutive part of attention. There is therefore a further question about whether parts of the environment and the non-brain body could, in principle, be realizers of mental phenomena.

Watzl takes enactivists (like Noë, 2004) to make a claim about the essence of perceptual agency: “Enactive theories locate the active element of perception in overt, bodily behavior and in implicit knowledge of sensory-motor contingencies” (Watzl, 2017, p. 42). Against enactivism, Watzl points to cases where “perception is active even when we do not move our bodies” (Watzl, 2017, p. 42), such as when you are listening to a friend without moving your body. Such cases of covert perceptual attention involve perceptual agency without involving bodily activity. Hence, for Watzl, bodily activity cannot be part of the essence of perceptual agency. We will return to the putative non-involvement of the body in covert perceptual attention in Sect. 4, but for now let us bracket it.

Watzl can then accept that many instances of perceptual attention do involve bodily movement, but that such bodily involvement is not necessary:

In some cases attending involves the body: there is overt visual attention (where you do move your eyes), tactile attention (“feeling”), gustatory attention, (“tasting”), and olfactory attention (“sniffing”). But such involvement of the body need not be present in all forms of perceptual activity” (Watzl, 2017, p. 46).

Elsewhere, he formulates the point as follows:

Purity: Consciously attending to something does not necessarily involve (a characteristic) bodily movement or posture, or an awareness of such a bodily movement or posture (Watzl, 2011, p. 147).Footnote 5

However, it is not clear to us whether enactivists are committed to the claim that all forms of perceptual agency involve bodily activity. Hurley (1998b), for example, investigates the question whether token-explanatory vehicles of thought can go external. Vehicle externalism is a claim about what constitutes a particular token of thought, not about the minimal constitution basis for thoughts of that type. Hurley sees an interesting contrast between her position, vehicle externalism, the thesis that vehicles can extend, and vehicle internalism, the thesis that the vehicles of thought are essentially brainbound.Footnote 6

The question about the involvement of the body in attention is therefore not settled by the Purity claim. A further question can be asked about the nature of contribution of bodily activity in those cases where bodily activity is involved in perceptual attention: Is this involvement only ever causal, or can bodily activity be constitutive of perceptual attention? If the former, then Watzl subscribes to the view that perceptual attention is essentially an inner action.

Watzl isn’t particularly clear on this. In an earlier publication, he states that “conscious attention is associated with and probably causally connected with bodily movement” (Watzl, 2011, p. 149), which suggests his view is that the question of constitutive bodily involvement in perceptual attention is not even worth considering. However, a careful reading of Structuring Mind reveals a different picture. In fact, although he does not explicitly say as much, we read Watzl as being committed to a substantial version of the embodied or extended mind thesis (Clark & Chalmers, 1998; Menary, 2010) when it comes to attention. In his analysis of overt visual attention, Watzl writes:

What it is for s to overtly visually attend to o is for s to regulate a priority system S such that a state of foveating on o is of top priority in S.

Where foveating on o is a state where one’s eyes are pointed so that o is visually represented at the fovea (roughly: the part of the eye with the highest visual resolution).

In other words, to overtly visually attend to something in your environment is to prioritize a particular embodied mental state (a state of pointing your eyes so that o is visually represented at the fovea). Hence, embodied mental states can be elements in our priority structures. Watzl does not elaborate on the consequences of this; he does not even flag it as a philosophically interesting move. This is surprising since elsewhere he considers the possibility of there being extended priority structures, i.e., structures that are ‘realized partly by processes outside the subject’s brain’ but chooses to remain neutral on this issue, commenting that ‘This may seem implausible to some theorists (though others may find it congenial.)’ (Watzl, 2017: 88). But Watzl’s separate discussion of overt visual attention reveals that he is in fact committed to there being extended priority structures.

This does not yet, however, settle the issue of the nature of the involvement of the body in perceptual attention. More specifically, it does not settle the issue of whether we should conceive of the mental action of perceptual attention as an inner action. Recall that on Watzl’s view attention is a complex phenomenon with different parts. On the one hand, there are the elements of the structure, which are mental states, processes, and activities. On the other hand, there is the structuring relation of priority, which pertains between the elements. Furthermore, there is the ongoing activity of regulating those structures. It is the mental action of actively regulating our priority structures that gives perceptual experience its agential character. Watzl’s discussion of overt visual attention reveals that he is committed to embodied or extended priority structures in the following sense: the elements of our priority structures can be embodied mental states, such as foveating on o. However, he does not say anything about the possibility of the priority relation itself or the activity of regulating those priority structures being embodied. As we will see, this is a mistake if he wants to be able to accommodate enactivist intuitions about perceptual agency.

3 Overt attention and the body

Consider some of the cases Watzl wants his account to do justice to: ‘moving your fingers across the fabric of a new coat and actively feeling its softness’ (ibid. 17), ‘sniffing something’ (ibid. 45), ‘looking for… a blackbird’ (ibid. 45). As Watzl points out, the body is clearly involved in such cases of perceptual agency (ibid. 46). The question is how we should think about the nature of the involvement.

Let us focus on the case of (voluntarily) sniffing something. The OED defines sniff as follows: ‘An act of sniffing; a single inhalation through the nose in order to smell something, usually accompanied by a characteristic short snuffling sound.’ So sniffing is a bodily action. But how might we square the bodily nature of sniffing with the idea that olfactorily attending to something by sniffing can be a mental action?

There are arguably two options available. The first option is to claim that the case of olfactorily attending to something by sniffing breaks down into two separate component actions: the outer bodily action of inhaling through the nose sharply (or something similar), and the inner mental action of prioritizing the olfactory experiences. On this view, the bodily and the mental action are distinct (though closely causally related) events or processes. The movements of the body facilitate the act of olfactory attention but are not themselves part of it. Analogously, the act of prioritizing the state of foveating on o requires two actions: the bodily act of moving one’s eyes to foveate on o and the mental act of prioritizing the state of foveating on o.

Alternatively, one might argue that the sniffing is a mental action with an overt-bodily action vehicle. On this view, the movements of the body in sniffing partly constitute the mental action of olfactorily attending to something. Analogously, the bodily act of foveating on o partly constitutes the mental action of visually attending to o. Although the type of mental action in question does not necessarily have to involve the body, the body is still constitutively and not just causally involved in the action.Footnote 7

We propose to call the first option the Conjunction Account (see Fig. 1). The account implies a particular division of labor between the bodily action and the mental action. The bodily action plays a role in delivering psychological elements to the mind, the mental action of paying attention organizes those elements into a priority structure. The body provides the input, the mind actively structures the elements.

Fig. 1
figure 1

Schematic outline of the Conjunction Account. The hybrid action consists of the conjunction of two separate, causally interacting actions: a bodily action and a mental action

An important implication of the Conjunction Account is that one ends up having to deny that sniffing something can itself be part of the activity of attending. The Conjunction Account separates overt olfactory attention into a separate bodily and mental action. Once that separation is in place, we have to say that sniffing something falls on the bodily side of the distinction. But then we must deny that the sniffing is itself part of the attentional action, and hence a mental action. On the Conjunction Account then, the bodily movements have exactly nothing to do with what makes perception active: it is inner mental (and not outer bodily) action that ‘infuses’ perceptual experience with agency.

This implication need not itself be a problem for Watzl (although he would need to change some of his formulations). However, we think it spells trouble for Watzl’s claim that the activity view of attention can accommodate ‘many of the intuitions that motivate enactivism’ (cf. Watzl, 2017, p. 45). The examples that motivate enactivism are of a kind that Watzl invokes, e.g., sniffing something or running one’s hands across the surface of something so as to feel its softness. The enactive intuition is, we take it, that in such cases the active character of perceptual experience derives from the active involvement of the body: it is because I move my hand over the fabric that I have an agentive perceptual experience. This is exactly something that needs to be denied on the Conjunction Account. At most, bodily activity can, through its causal interaction with mental activity, have an indirect impact on perceptual agency.

Another way to make the enactivist intuitions explicit is by looking at a piece of descriptive phenomenological psychology. As Watzl persuasively argues, we have ‘agentive awareness’ not only for bodily actions but also for mental actions, such as attendings (Watzl, 2017, 232-4). Agentive awareness is a kind of non-observational and pre-cognitive awareness of engaging in an activity, and forms a core part of our sense of agency (Watzl, 2017: 229–232). If we take the Conjunction Account seriously and analyze overt attention in terms of two separate and simultaneously performed actions, then we would expect agentive awareness for engaging in two distinct actions: one mental and one bodily. Phenomenologically, this seems incorrect. When I attentively sniff something, it is not as though I experience myself as doing two things in quick succession or simultaneously: inhaling sharply through the nose and placing olfactory perceptions on top of a priority structure. In overt perceptual attention, we experience ourselves as engaged in a single unified action. This phenomenological insight is further corroborated by recent work on the philosophy and psychology of olfaction, which supports the idea that there is a much tighter connection between the embodied act of sniffing and olfactory attention than is suggested by the Conjunction Account (Barwich, 2020, pp. 153–160).Footnote 8

Despite these problems, it can seem like the Conjunction Account is the only game in town. After all – the reasoning goes – perceptual attention can be doubly dissociated from bodily behavior. On the one hand, the very same bodily action of sniffing (i.e., inhaling through the nose sharply) can be performed with or without an act of attention: I might sniff to attend to the smell of the brandy, but I might also sniff without attending olfactorily, say, because I have a cold. If the bodily movements are identical in both cases, then what makes the difference? The obvious answer is that the difference maker is something external to the bodily action itself: a separate mental action. An example of this kind of reasoning can be found in McClelland (2020, p. 17):

But is attending a mental act? Overt attention is the bodily activity of directing one’s sense organs toward a particular stimulus, property or region. Covert attention is the mental act of concentrating on a particular perceived stimulus, property or region… what makes the warning light distracting is that it pulls one’s concentration away from one’s work, not just one’s eyes.

On the other hand – the argument continues – there are cases of covert perceptual attention (i.e., cases of perceptual attention that do not involve a characteristic bodily action or posture).Footnote 9 The temptation then is to generalize from cases of covert to cases of overt perceptual attention, and claim that even in cases where the body is involved, the nature of the involvement can only ever be causal and never constitutive. In other words, we have a case of two actions causally interacting. In the next section, we will look in some detail at covert perceptual attention. There we will undermine the standard considerations put forth in favor of the claim that covert perceptual attention does not constitutively involve the body, which in turn will block the above generalization. For now, let us focus on the argument that the double dissociation between bodily and mental activity provides a reason for going the conjunctive route.

There is an instructive parallel to be drawn here between the case of overt perceptual attention and a related case in the literature on mental action. When discussing the phenomenon of ‘thinking out loud’, Soteriou (2013) considers a version of what we have called the Conjunction Account. As he points out, saying things out loud is not sufficient for thinking out loud, since it is possible to say things out loud without thinking them. Moreover, it is possible to think something without saying it. Thus, we can be tempted to think that the missing ingredient in the case of saying things without thinking is a “distinct ‘inner’ process” of thinking (p. 240), and that the thinking out loud case involves ‘the conjunction of two separable actions – mental action plus bodily action.’ (p. 240).

However, as Soteriou then observes, this rests on an inference that is both mistaken and unforced.Footnote 10 The point is that we are not compelled to adopt the Conjunction Account based on the double dissociation. There is another option available, which is to claim that

the verbal utterance instantiates two kinds of activity—overt bodily (talking out loud), and mental (calculating whether p)—in virtue of the fact that it instantiates a third, basic, non-reducible kind of activity, namely a mental activity with an overt-bodily-action vehicle (in this case, calculating whether p out loud) (Soteriou, 2013, p. 241).

In other words, on Soteriou’s analysis there is only one fundamental kind of activity, a mental activity with an overt-bodily-action vehicle, in virtue of which talking out loud instantiates both a bodily and a mental action. The same conceptual move is available in the case of overt olfactory attention. From the fact that there are cases of both covert attention and of bodily activity that does not involve attention, we need not conclude that in the cases of overt perceptual attention we find two separate actions occurring in parallel or in quick succession. Rather, the better option is to say that the act of olfactorily attending instantiates two kinds of actions—overt bodily (inhaling through the nose sharply) and mental (prioritizing the olfactory perceptions) in virtue of instantiating a third, irreducible kind of action: a mental action with an overt-bodily-action vehicle. Let us call this the Constitution Account, and call such actions embodied mental actions (see Fig. 2).

Fig. 2
figure 2

Schematic outline of the Constitution Account. An embodied mental action (or, equivalently a mental action with an overt-bodily-action vehicle) instantiates both a bodily action and a mental action

The Constitution Account requires giving up on the idea that bodily and mental actions are mutually exclusive. Instead, the same event can instantiate both a mental and a bodily action. This raises some puzzles about how mental and bodily actions are to be distinguished. One proposal, offered by Soteriou, is that although mental actions can be constituted by bodily events, they are not exhausted by them. For an observer, it is directly observable that an agent is inhaling through their nose, but whether the bodily activity constitutes a mental action is not observable in the same way. This does not mean a distinct action over and above the bodily action is going on, however, just that mental actions are not exhausted by observable bodily activity.Footnote 11 Ultimately, we do not require a sharp criterion for identifying mental actions. The cases that we have focused on – calculating whether p and attending – are prototypical mental actions. The question is how to construe the kind of involvement of the body in cases where the mental action is accompanied by bodily activity: thinking out loud and overt attention.

Let us take stock for a moment. In this section we have investigated the role of the body in overt attention. We have presented the Conjunction Account and the Constitution Account as two different ways in which mental and bodily action can be linked in overt attention. We have suggested that the Conjunction Account has a number of drawbacks that the Constitution Account lacks: it attributes the active element in perception exclusively to the ‘inner’ mental act of attending thus failing to capture enactivist intuitions, and predicts that in overt attention, one has agentive awareness for engaging in two distinct actions, which is not in line with the phenomenology. On the Constitution Account, on the other hand, overt acts of attending become embodied mental actions: it is the embodied mental action that infuses perceptual experience with agency.

This result puts pressure on Watzl’s solution to the puzzle of perceptual agency, and the way he contrasts his solution to enactivism. The central examples that motivate enactivism are of a kind that Watzl mentions, such as sniffing something or running one’s hands across the surface of something so as to feel its softness. The intuition that enactivists rely upon is that the active or agential character of these experiences derives, at least in part, from the involvement of the body. However, on the Conjunction Account, bodily movements have exactly nothing to do with what makes perception active: it is an inner mental (and not an overt bodily) action that, to use Watzl’s term, ‘infuses’ perceptual experience with agency.

If Watzl accepts the Constitution Account, then the opposition between the mental and bodily action break down: the mental and the bodily “interpenetrate” (Dreyfus & Taylor, 2015). In particular, it leads to the commonsense view that the relevant form of activity in overt perceptual attention are activities like looking, listening, touching, tasting, and sniffing. This is exactly what proponents of the bodily action route have been claiming (see Gibson, 1976, p. 234). The activity view ends up as a rather close cousin of enactivism and Gibsonian ecological psychology.

However, arguably there is something dissatisfying about the conclusion thus far. Afterall, the disagreement between the bodily activity and the mental activity views about perceptual agency can best be appreciated by considering covert perceptual attention. Let us therefore turn to examine covert perceptual attention.

4 Covert attention and the body

In this section, we will investigate the philosophical import of the existence of covert perceptual attention.Footnote 12 How to reconcile the embodied nature of overt perceptual attention with the supposedly disembodied nature of covert perceptual attention? One possibility is to opt for a disjunctive account of perceptual attention in which attention is realized by an embodied mental action in the overt case, and by an inner mental action in the covert case. Such a disjunctive position is only warranted when both types of attention are substantially different. Contra the disjunctive view, we will argue that there are important functional and explanatory similarities between overt and covert perceptual attention and hence that overt and covert perceptual attention are of a common kind.

It is important to distinguish a number of different claims about the embodiment of attention. On a Naïve Embodied View, actual overt movement of the body in the direction of the attended object is necessary for visual attention. A proponent of such a view would have to deny the very existence of covert attention. However, on a more Sophisticated Embodied View, covert attention might not involve overt bodily movement, but still involve the same motor processes that underly overt attention. For example, on the pre-motor theory of attention (Craighero & Rizzolatti, 2005; Rizzolatti et al., 1987), visual attention to a target location does not require an overt movement of the eyes, but requires that there is motor-preparation of the relevant eye movement.Footnote 13 The Sophisticated Embodied View contrasts with what we call the Independence View, the view that visual attention is completely independent of the position of the eyes.

The transition from the Naïve Embodied View to the Sophisticated Embodied View is a significant one. On the sophisticated view, embodiment is not understood in terms of actual overt bodily movement, but in the activation of preparatory motor processes that can occur in the absence of actual bodily behavior. Still, this is not an ad-hoc move in the debate. When critizing enactive approaches, Watzl (2017) clearly has both the Naïve and the Sophisticated Embodied View as his target:

The view that all perceptual experience is constitutively dependent on either bodily behavior itself, or on motor preparation systems in our brains, is problematic. (p.42, emphasis added)

We therefore propose that the relevant philosophical debate is between the Sophisticated Embodied View and the Independence View.

In this section, we limit ourselves to analyzing covert visual attention and covert auditory attention. We take these to be the hardest cases from the perspective of embodied cognition. It is, for example, unclear whether there is even such a thing as covert olfactory or covert gustatory attention.

4.1 Covert visual attention

The main motivation for thinking there is covert visual attention are cases like the Posner paradigm (Posner, 1980). The Posner paradigm (Fig. 3) is standardly introduced in the literature to motivate the very existence of covert attention (Watzl, 2011; Wu, 2014). In the Posner paradigm, the eyes are kept fixed at a fixation point and the subject is cued to visually attend to a region of space without moving their eyes. The cue can be either shown at the point of fixation (endogenous cue) or shown outside of the point of fixation (exogenous cue). After the cue disappears, a stimulus arises either congruent with the cue (valid cue) or incongruent with the cue (invalid cue). The subject then responds as quickly as possible to the onset of the stimulus.

Fig. 3
figure 3

Schematic depiction of the Posner paradigm. See text for explanation (created by Local870, CC BY 3.0)

When the subject is presented with a left cue, a stimulus presented to the left is responded to quicker than when the stimulus is presented to the right (and vice versa).

The philosophical significance of the Posner paradigm is that the same bodily posture – the eyes fixated at the cross – can support different visual attentional states. Watzl takes examples like the Posner cases to support the view that although overt visual attention does involve the body, bodily activity is not an essential part of perceptual attention (Watzl, 2011: 148-9, cf. Watzl, 2017: 44).Footnote 14

However, there are several problems with this argument. Firstly, it is important to point out that Posner’s definition of covert attention differs from Watzl’s. Posner defines covert attention as ‘attention to a position in visual space other than fixation’ (Posner, 1980, p. 3). Posner’s definition of covert attention holds that a subject can attend to a position in visual space distinct from the fixation point of the eyes. Watzl’s definition of covert attention holds that the subject can do so without any observable bodily movement. This is a much stronger claim. And while Posner’s definition is fully compatible with the empirical literature, Watzl’s definition runs into problems.

There is a rich literature on the role of microsaccades in covert attention. Microsaccades are small eye movements that happen during fixation. Initially they were thought to be random artifacts of the oculomotor system, but research has shown that they are directly correlated with the direction of covert attention (Hafed & Clark, 2002). If this research holds, and there is considerable evidence that to back up this claim (Barnhart et al., n.d.; Engbert & Kliegl, 2003; Hafed & Clark, 2002; Rolfs, 2009; Ryan et al., 2019; Yuval-Greenberg et al., 2014) then microsaccades are overt bodily signs associated with covert shifts of attention. This makes covert attention (in Posner’s sense) a case of overt attention (in Watzl’s sense).

Still, the question of whether covert attention involves observable bodily signs is separate from the question whether covert attention is embodied. It might be, for example, that microsaccades are just a byproduct of attention and not a functional part.

A historical precursor of the debate between the Independence View and the Sophisticated Embodied View can be found at the end of the 19th century in discussions between Helmholtz and James. In the Attention chapter of The Principles of Psychology (1890) James writes about covert attention (although he does not call it that): “we may attend to an object on the periphery of the visual field and yet not accommodate the eye for it” (p.437). In this passage, James discusses the interpretation of Helmholtz’s experiments on what we now call covert attention. For Helmholtz, the fact that we can direct our attention without moving our eyes means that:

[o]ur attention is quite independent of the position and accommodation of the eyes, and of any known alteration in these organs; and free to direct itself by a conscious and voluntary effort upon any selected portion of a dark and undifferenced field of view. This is one of the most important observations for a future theory of attention. (Helmholtz, as quoted by James, 1890, p.438)

Helmholtz here articulates what we have called the Independence View of perceptual attention. Attention can be directed at will independently of the position of the eyes. James is skeptical about this interpretation. He approvingly cites Hering:

Whilst attending to the marginal object we must always attend at the same time to the object directly fixated. If even for a single instant we let the latter slip out of our mind, our eye moves towards the former […]. (Hering, as quoted by James, 1890, p.438)

James adds:

Accommodation exists here, then, as it does elsewhere, and without it we should lose a part of our sense of attentive activity. In fact, the strain of that activity (which is remarkably great in the experiment) is due in part to unusually strong contractions of the muscles needed to keep the eyeballs still, which produces unwonted feelings of pressure in those organs (James, 1890, p. 438).

James is making two points here. Firstly, there is the physiological point that strong contractions of the muscles are needed in order to inhibit foveating towards the peripheral object. Secondly, there is the descriptive phenomenological point that part of the “sense of the attentive activity” in this case derives from the felt strain of keeping the eyes fixed. Recall Watzl’s claim that we have a sense of agency for attending. James can be read as making the argument that our sense of agency for covert visual attention includes the sense of actively inhibiting foveation.

James analysis’ allows us to provide an alternative interpretation of Posner cases. Covert visual attention requires the (demanding) action of keeping one’s eyes fixated at the cross and inhibiting foveating towards the marginal object. It seems uncontroversial that keeping the eyes fixed is a voluntary intentional action. If this is right, then covert visual attention involves intentional omission (Clarke, 2010; Shepherd, 2014). Shepherd (2014) argues that intentional omissions are describable as intentional actions. For example, when a child plays hide and seek, she may hold still for several minutes. Holding still in this case is both an intentional omission and an intentional action: “it requires the sending of a pattern of motor signals to certain muscles, perhaps the inhibition of other motor signals, the maintenance of balance, with fine adjustments made in response to feedback” (Clarke, 2010, p. 159). Building on James, we propose that keeping your eyes fixed during covert attention functions analogously to standing still during hide and seek: it is an effortful intentional action. Both cases involve a characteristic bodily posture and are thus examples of intentional embodied action.

An objection here might be that we should think about this case as involving two distinct actions (a) the action of holding the eyes fixed and (b) the action of paying visual attention to the marginal object. If so, then it would seem natural to describe the first action as bodily, and the second as (inner) mental action. However, this would be a mistake. The case of covert visual attention is not analogous to other cases where you perform two actions at once, such as raising your arm whilst wiggling your toes. This analogy fails to capture the relationship of dependence that exists between the holding of the eyes in place and the covert attention. Performing a Posner-style act of covert visual attention requires holding one’s eyes in place. Indeed, it is part of what you are doing when you covertly attend visually. We propose to analysis this case as a single, complex embodied mental action.

Slight variations of the Posner paradigm provide empirical evidence for the Sophisticated Embodied View over the Independence View. One elegant line of argument comes from a Craighero et al. (2004). In their experiment, Craighero et al., compared the original Posner paradigm with a condition in which the subject was rotated with respect to the screen on which the stimuli were shown. Because the eyes rotate along with the screen, the stimulation of the retina was exactly the same in both conditions (see Fig. 4). The consequence of this rotation is that although the subject could still see the stimulus boxes in their visual field, they could only foveate toward the nasal stimulus box (i.e., towards the nose), while the temporal stimulus box (i.e., towards the temple) could not be foveated towards.

Fig. 4
figure 4

The rotated Posner paradigm (figure adapted from Craighero et al., 2004)

If the Independence View is right, covert attention should not be dependent on the position of the eyes. If the Sophisticated Embodied View is right, however, then covert attention would be mediated by whether the attended region of space can be foveated towards. The authors found that in the rotated condition, the Posner effect remains intact for the nasal condition (in which the target is foveatable) but breaks down for the temporal condition (in which the target is not foveatable). They conclude that visual “attention cannot be directed toward spatial locations that are difficult for the eyes to access” (p.332), supporting the pre-motor theory of attention and hence the Sophisticated Embodied View.

We have traced two lines of argument for the Sophisticated Embodied View of covert visual attention. First, covert visual attention requires the inhibition of the tendency to foveate towards a target. We have argued that this inhibition to foveate itself qualifies as an embodied action. Second, variations of the Posner paradigm show that although the actual movement in covert cases is inhibited, attention still bears the traces of the motor process: locations that are hard to foveate to are hard to covertly attend to.

4.2 Covert auditory attention

How might the Sophisticated Embodied View that we have argued for in this section might translate to perceptual attention in other modalities? In particular, the well-established phenomenon of covert auditory attention (e.g., Broadbent, 1958; Cherry, 1953) does not seem to require the inhibition of a tendency to orient and might therefore support the Independence View.

Auditory attention is an interesting case because humans and other higher primates have lost the ability to auditorily orient by changing the shape and direction of the pinna (the part of the ear that is outside of the head). Cats and dogs, for example, have retained this ability. Although this is a bit of speculation, it is quite likely that covert auditory attention in a species with the ability to move their ears, such as cats and dogs, does require an active inhibition to orient, similarly to covert visual attention in humans.

Despite the loss of this ability, it has been proposed that a system for controlling ear movements lies, in vestigial form, in the human brain. (Hackley, 2015; Stekelenburg & van Boxtel, 2002; Strauss et al., 2020). In a recent paper, Strauss et al. (2020) recorded activation of the auricular muscles while subjects attended covertly to stimuli coming from different positions in space. Even though the ears barely move during auditory attention tasks, Strauss et al., found that the activation of the muscles around the ear is indicative of the direction in which attention was covertly directed.

Based on current evidence, it is speculation whether the activity in the system that controls auricular muscles is a byproduct of auditory attention, or is itself responsible for covert auditory attention. Strauss et al., for example, conclude that the “exploration of the relationship between human auriculomotor activity and subtle markers of covert attention […] has scarcely begun” (Strauss et al., 2020, p.13). Even though humans lost the ability to orient their ears approximately 25 million years ago, it might still very well be the case that the premotor theory holds for covert auditory attention.

According to the analysis presented in this section, the existence of covert perceptual attention does not show that bodily involvement is not required for perceptual agency but actually highlights the involvement of the body in covert attention in multiple ways.

5 Revisiting perceptual agency

We started out the paper by introducing two routes to resolving the puzzle of perceptual agency. Either perception is active because it is entangled with bodily action, or it is active because it is infused with mental action. The former position is associated with embodied approaches to cognition, while the latter is associated with internalist approaches to cognition. However, this sorting presupposes a coherent distinction between ‘mental action’ and ‘bodily action’. In classical debates about the active nature of perception, the mental/bodily distinction is aligned with the inner/outer distinction. Mental action is then an ‘inner action’ operating on internal mental elements (e.g., representations) (Gibson, 1976; Richards, 1976).

The association of the mental with an inner domain has been criticized in the literature on both enactivism and mental action: mind and body interpenetrate (Dreyfus & Taylor, 2015). We have argued that on analogy with `thinking out loud´, overt perceptual attention should be understood as an embodied mental action. Overt acts of attention such as sniffing and visually orienting are not made up of a conjunction of two separate actions (an ‘inner’ mental and an ‘outer’ bodily) but are unitary actions that instantiate two act sub-types: a bodily and a mental act. The fact that it is possible, for example, to both move one’s eyes in the direction of o without visually attending to o, and to visually attend to o without moving the eyes in its direction, does not warrant the conclusion that we have here two separate actions: one mental and one bodily.

Allowing that there are such things as embodied mental actions has substantial implications for the debate on perceptual agency. We can now articulate two different versions of the mental action route to thinking about perceptual agency. On an essentially internalist understanding of mental action, it is an inner action that makes perception active. This is the view of mental activity advocated by Richards (1976) and which has a long history in both philosophy and psychology. The inner mental action route is wedded to analyzing overt attention as a conjunction of two separate actions: overt bodily and inner mental. The (counterintuitive) consequence of this view is that bodily action does not contribute to perceptual agency in overt attention: if the mental and bodily action are distinct, and it is mental action that infuses perception with agency, then bodily action is screened off from perception and does not contribute to its agential character. You can peer and sniff all you like, but your peering and sniffing will have exactly nothing to do with what makes the vision or olfaction active.

On an embodied understanding of mental action, perception is active because it is infused with embodied mental action. The embodied mental action route combines the strengths of the other two. It agrees with the inner mental action route both that mere bodily movement is not sufficient for perceptual agency (after all, we can move our body without a corresponding act of attention) and that in order to pay perceptual attention to something it is not necessary that we orient our perceptual organ towards it (after all, there are cases of covert perceptual attention). But it agrees with the bodily action view that when we are overtly attending, our active embodied movement contributes to our perceptual agency. Furthermore, the view can be expanded so as to capture the important sense in which canonical cases of covert perceptual attention are not inner mental actions but rather constitutively depend on our embodiment.

The embodied mental action route to perceptual agency is not committed to each and every form of perceptual attention being an embodied mental action (it is neutral on this). In the covert attention section, we have focused on visual and auditory covert attention, as we take these to be the hardest cases to account for from the perspective of embodied cognition. We have argued that an analysis of covert visual attention does not support an account on which covert attention is independent from bodily action. Instead, covert visual attention involves the body in multiple ways. Even in the auditory domain, auriculomotor activity is involved in covert auditory attention. Whether auriculomotor activity is covert auditory attention, or whether it is a consequence of it is hard to say given current available evidence, but an inner action view of covert auditory attention is by no means mandated.

On the other hand, we would warn against assuming that each and every form of overt perceptual attention in its different modalities has a covert equivalent. For example, it is not clear that there is covert olfactory or covert gustatory attention. Likewise, it is probably highly misleading to think there is a covert equivalent of tactile attention in anything like the sense that there is a covert equivalent of overt visual and auditory attention.

In principle then, it should be possible to make a typology of necessarily embodied, contingently embodied and disembodied acts of attending. However, we suggest that in the end pursuing such a typology would not be helpful. It is better to give up on the idea that the mental action/bodily action distinction tracks a philosophically deep distinction when it comes to perception. In a discussion of mental action, Levy (2019) ends up with a similar conclusion: “Either the distinction between mental and bodily acts is specious, or it is unimportant” (Levy, 2019, 984). If we try to draw a contrastive definition, it will end up being specious, since we will fail to capture cases of so-called mental action where the body is part of the activity (e.g., multiplying two numbers using pencil and paper or thinking out loud). Alternatively, we might treat the distinction as shallow, in which case it will not do much explanatory or philosophical work.

Levy’s point is that mental and bodily acts “exhibit key functional and explanatory similarities” (Levy, 2019, p.984). Whether you are engaging in a thoughtful inner monologue with or without moving your lips will only make a difference in explaining behaviour in highly contrived circumstances. Levy goes on to suggest that whereas the mental action/bodily action distinction is not a deep and philosophically significant one, there are important questions about the possible philosophical significance of the distinction between overt and covert forms of activity. As he points out, ‘Such topics as the nature of covert acts, the possible significance of their being covert, and the implications their covertness may have for understanding agency and related notions are currently much under-explored’ (ibid. 987).

Returning to perceptual agency, although Watzl and Eilan set up the contrast between the enactive and the attention-based approaches to perceptual agency in terms of the difference between bodily and mental action, as we have seen, this would only be a philosophically significant distinction if it we understood mental action as inner action. It is this notion of mental activity that is at stake in the debate between Richard and Gibson.Footnote 15 However, as we have shown, the implication of the inner action view is that it locates the active element in perception in the inner mental action of attending, even in cases where the body is involved. Enactive approaches locate the active element in perception in minded embodied activity. Watzl therefore cannot hold on to the inner action view of attention and simultaneously account for enactive intuitions. In order to account for these intuitions, Watzl must adopt the embodied mental action, rather than the inner mental action, route to theorizing perceptual agency. But this means he ends up closely aligned with enactivism and other embodied approaches.

The embodied mental action route comes with substantial commitments about the embodiment of attention. To say that the mental can interpenetrate with the body is a departure from the orthodox view in which mental activity is an internal activity. The activity of structuring mind – of attending – can be constituted, in part, by embodied mental actions. The consequence of endorsing the embodied mental action route is vehicle externalism (Hurley, 1998b) about the mental act of perceptually attending.

Finally, in the spirit of further bridging the activity view of attention and embodied and enactive theories of cognition, we would like to point out a number of points of convergence. Occasionally Watzl comes close to concepts that are at the heart to enactive and embodied theories, namely, those of affordance and solicitation. For example, he claims that the activity of structuring the field of experience is sometimes guided by phenomenal salience, which is similar to a solicitation (Watzl, 2014). Moreover, salience is not experienced individually but has a holistic field-like organization (Watzl, 2017 Chap. 9). Watzl’s suggestion here fits in with a growing interest in the philosophy of mind in the idea of affordances for both mental and bodily actions (McClelland, 2020), Bruineberg & Van den Herik, 2021. However, according to our argument the idea of a (disembodied) mental affordance is inappropriate when it comes to perceptual attention. We suggest that the idea that embodied mental action plays a role in structuring the field of experience is congenial to recent enactive work on the field of affordances (Rietveld & Kiverstein, 2014), Bruineberg & Rietveld, 2014). Rather than defining itself in contrast to enactive approaches, the activity view of attention could benefit from the conceptual apparatus provided by embodied and enactive approaches (Dreyfus, 2002; Gibson & Rader, 1979; J. J. Gibson, 1979; Hurley, 1998a; Merleau-Ponty, 1945; Rietveld & Kiverstein, 2014; Thompson, 2007).