Methods of confounder selection in obstetrics and gynaecology studies: An overview of recent practice,BJOG: An International Journal of Obstetrics & Gynaecology

当前位置： X-MOL 学术 › BJOG An Int. J. Obstet. Gynaecol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Methods of confounder selection in obstetrics and gynaecology studies: An overview of recent practice
BJOG: An International Journal of Obstetrics & Gynaecology ( IF 5.8 ) Pub Date : 2024-03-01 , DOI: 10.1111/1471-0528.17799
Peter M. Socha ₁ , Sam Harper ₁ , Jennifer A. Hutcheon ₂

Affiliation

Identifying confounders and selecting variables to include when controlling for confounding (‘adjustment sets’) are key methodological decisions in studies that estimate the effect of an intervention or exposure, and relevant reporting guidelines suggest including the justification for how confounders were selected (Strengthening the Reporting of Observational Studies in Epidemiology, STROBE, item 16a).¹ Excluding confounders or including variables that are not confounders in adjustment sets can lead to spurious findings. For example, because pre-eclampsia influences gestational age at birth, adjusting for gestational age can make pre-eclampsia appear protective against cerebral palsy.² Contemporary approaches to selecting adjustment sets use content knowledge about the known or likely causal relationships between potential confounders and the exposure and outcome (e.g. using a directed acyclic graph, DAG).^{2, 3} At the same time, data-driven methods for identifying adjustment sets (e.g. selecting variables that are significantly associated with the outcome) continue to be used despite these methods not being able to reliably identify confounders.⁴ Our objective was to describe the prevalence of different approaches of selecting adjustment sets in a recent sample of non-randomised studies in obstetrics and gynaecology.

We reviewed all full-length original research articles from January 2022 through June 2022 in three obstetrics and gynaecology journals (American Journal of Obstetrics and Gynaecology, British Journal of Obstetrics and Gynaecology and Obstetrics & Gynaecology). We included non-randomised studies investigating the relationship between an intervention/exposure and a health outcome. We excluded descriptive studies (e.g. describing trends), predictive studies (e.g. identifying high-risk patients) and studies where the aim was unclear. We classified studies as using a data-driven method if the final adjustment set was determined in whole or in part using data-driven methods, and categorised these methods into significance testing (e.g. selecting variables associated with the outcome) and change-in-estimate approaches (e.g. selecting variables that, when included, changed the estimate of interest by 10%). We classified studies as using content knowledge if the authors reported specifying confounders a priori or discussed the relationship between at least one potential confounder and the exposure and outcome.

Of the 252 studies published during our study period, 129 met inclusion criteria. We excluded 29 descriptive or predictive studies, eight studies with unclear aims (e.g. ‘risk factor’ studies) and 86 studies with other aims or designs (e.g. trials, validation studies). The results are summarised in Table 1. Almost half of the included studies (44%) did not explicitly state what approach was used to identify their adjustment sets. Of the 72 studies that were more explicit in their justification, 39% used data-driven methods to select some or all variables in the adjustment set. Of the 44 studies that used content knowledge, 16% used a DAG.

TABLE 1. Reported methods used to identify confounding factors and select adjustment sets in N = 129 recent non-randomised studies in obstetrics and gynaecology.

Reported method	n (%)
Data-driven	28 (22)
Significance testing	25 (19)
Change-in-estimate	4 (3.1)
Content knowledge alone	44 (34)
Used a DAG	7 (5.4)
Reported a DAG	3 (2.3)
Unclear/unreported/no justification	57 (44)

Abbreviation: DAG, directed acyclic graph.

We found that justifications for the selection of adjustment sets were often unclear and the use of data-driven methods to select adjustment sets was common in recent studies in obstetrics and gynaecology. Data-driven methods for selecting adjustment sets can cause bias by excluding confounders or including factors caused by the exposure or outcome, but can be useful for variable selection in predictive models.^2-4 The persistent use of these methods for causal questions may result from a combination of a lack of awareness of the potential for bias and a lack of clarity on the distinctions between descriptive, predictive and causal questions.⁵ Content knowledge is required to select adjustment sets and DAGs offer a compelling framework for clearly reporting underlying modelling assumptions (for examples from obstetrics and gynaecology, see Ananth and Schisterman).² For studies that consider a large number of potential confounders, a more pragmatic approach than presenting a single DAG is to simply include all variables that are (or are proxies for) common causes of the exposure and outcome (for detailed criteria, see Vanderweele).³

中文翻译：

妇产科研究中混杂因素的选择方法：近期实践概述

识别混杂因素并选择控制混杂因素时要包括的变量（“调整集”）是估计干预或暴露效果的研究中的关键方法决策，相关报告指南建议包括如何选择混杂因素的理由（加强报告流行病学观察研究，STROBE，第 16a) 项。¹在调整集中排除混杂因素或包含非混杂因素的变量可能会导致虚假结果。例如，由于先兆子痫影响出生时的胎龄，因此调整胎龄可以使先兆子痫看起来可以预防脑瘫。²选择调整集的现代方法使用关于潜在混杂因素与暴露和结果之间已知或可能的因果关系的内容知识（例如使用有向无环图，DAG）。^{2, 3}同时，用于识别调整集的数据驱动方法（例如选择与结果显着相关的变量）仍在继续使用，尽管这些方法无法可靠地识别混杂因素。⁴我们的目标是描述最近妇产科非随机研究样本中选择调整集的不同方法的流行情况。

我们回顾了 2022 年 1 月至 2022 年 6 月发表在三种妇产科期刊（《美国妇产科杂志》、《英国妇产科杂志》和《妇产科杂志》）上的所有完整原创研究文章。我们纳入了非随机研究，调查干预/暴露与健康结果之间的关系。我们排除了描述性研究（例如描述趋势）、预测性研究（例如识别高危患者）和目标不明确的研究。如果最终调整集全部或部分使用数据驱动方法确定，我们将研究分类为使用数据驱动方法，并将这些方法分类为显着性检验（例如选择与结果相关的变量）和估计变化方法（例如，选择包含在内的变量，将兴趣估计改变 10%）。如果作者报告先验地指定了混杂因素或讨论了至少一个潜在混杂因素与暴露和结果之间的关系，我们将研究归类为使用内容知识。

在我们研究期间发表的 252 项研究中，有 129 项符合纳入标准。我们排除了 29 项描述性或预测性研究、8 项目标不明确的研究（例如“风险因素”研究）和 86 项具有其他目标或设计的研究（例如试验、验证研究）。结果总结在表 1 中。几乎一半的纳入研究 (44%) 没有明确说明使用什么方法来确定其调整集。在 72 项论证更为明确的研究中，39% 使用数据驱动的方法来选择调整集中的部分或全部变量。在使用内容知识的 44 项研究中，16% 使用 DAG。

表 1.在N = 129 项最近的妇产科非随机研究中用于识别混杂因素和选择调整集的报告方法。

报告方法	数（%）
数据驱动	28 (22)
显着性检验	25 (19)
估计变化	4 (3.1)
仅内容知识	44 (34)
使用了 DAG	7 (5.4)
报告了 DAG	3 (2.3)
不清楚/未报告/没有理由	57 (44)

缩写：DAG，有向无环图。

我们发现，选择调整集的理由往往不明确，并且在最近的妇产科研究中，使用数据驱动的方法来选择调整集很常见。用于选择调整集的数据驱动方法可能会因排除混杂因素或包含由暴露或结果引起的因素而导致偏差，但对于预测模型中的变量选择可能很有用。^2-4对因果问题持续使用这些方法可能是由于缺乏对潜在偏见的认识以及对描述性问题、预测性问题和因果问题之间的区别缺乏清晰认识造成的。⁵选择调整集需要内容知识，DAG 提供了一个令人信服的框架，用于清楚地报告基础建模假设（例如来自妇产科的示例，请参阅 Ananth 和 Schisterman）。²对于考虑大量潜在混杂因素的研究，比提出单个 DAG 更务实的方法是简单地包括作为（或代表）暴露和结果的常见原因的所有变量（有关详细标准，请参阅 Vanderweele）。³

更新日期：2024-03-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>