当前位置: X-MOL 学术Philosophical Studies › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Still no lie detector for language models: probing empirical and conceptual roadblocks
Philosophical Studies Pub Date : 2024-02-17 , DOI: 10.1007/s11098-023-02094-3
Benjamin A. Levinstein , Daniel A. Herrmann

We consider the questions of whether or not large language models (LLMs) have beliefs, and, if they do, how we might measure them. First, we consider whether or not we should expect LLMs to have something like beliefs in the first place. We consider some recent arguments aiming to show that LLMs cannot have beliefs. We show that these arguments are misguided. We provide a more productive framing of questions surrounding the status of beliefs in LLMs, and highlight the empirical nature of the problem. With this lesson in hand, we evaluate two existing approaches for measuring the beliefs of LLMs, one due to Azaria and Mitchell (The internal state of an llm knows when its lying, 2023) and the other to Burns et al. (Discovering latent knowledge in language models without supervision, 2022). Moving from the armchair to the desk chair, we provide empirical results that show that these methods fail to generalize in very basic ways. We then argue that, even if LLMs have beliefs, these methods are unlikely to be successful for conceptual reasons. Thus, there is still no lie-detector for LLMs. We conclude by suggesting some concrete paths for future work.



中文翻译:

语言模型仍然没有测谎仪:探索经验和概念障碍

我们考虑大型语言模型(LLM)是否有信念的问题,如果有,我们如何衡量它们。首先,我们首先考虑是否应该期望法学硕士拥有诸如信念之类的东西。我们考虑最近的一些论点,旨在表明法学硕士不能有信仰。我们证明这些论点是错误的。我们提供了围绕法学硕士信仰状况的更富有成效的问题框架,并强调了问题的经验性质。有了这一课,我们评估了两种衡量法学硕士信念的现有方法,一种是 Azaria 和 Mitchell 提出的(法学硕士的内部状态知道什么时候说谎,2023 年),另一种是 Burns 等人提出的。 (在没有监督的情况下发现语言模型中的潜在知识,2022)。从扶手椅到办公椅,我们提供的实证结果表明,这些方法无法以非常基本的方式进行推广。然后我们认为,即使法学硕士有信念,这些方法由于概念原因也不太可能成功。因此,目前还没有针对法学硕士的测谎仪。最后,我们提出了未来工作的一些具体路径。

更新日期:2024-02-17
down
wechat
bug