Data Imbalances in Coincidence Analysis: A Simulation Study,Sociological Methods & Research

当前位置： X-MOL 学术 › Sociological Methods & Research › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Data Imbalances in Coincidence Analysis: A Simulation Study
Sociological Methods & Research ( IF 4.677 ) Pub Date : 2024-03-19 , DOI: 10.1177/00491241241227039
Martyna Daria Swiatczak ₁ , Michael Baumgartner ₁

Affiliation

In this paper, we investigate the conditions under which data imbalances, a common data characteristic that occurs when factor values are unevenly distributed, are problematic for the performance of Coincidence Analysis (CNA). We further examine how such imbalances relate to fragmentation and noise in data. We show that even extreme data imbalances, when not combined with fragmentation or noise, do not negatively affect CNA’s performance. However, an extended series of simulation experiments on fuzzy-set data reveals that, when mixed with fragmentation or noise, data imbalances may substantially impair CNA’s performance. Furthermore, we find that the performance impairment is higher when endogenous factors are imbalanced than when exogenous factors are concerned. Our results allow us to quantify these impacts and demarcate degrees at which data imbalances should be considered as problematic. Thus, applied researchers can use our demarcation guidelines to enhance the validity of their studies.

中文翻译：

符合分析中的数据不平衡：模拟研究

在本文中，我们研究了数据不平衡（因子值分布不均匀时出现的常见数据特征）在哪些条件下会对重合分析 (CNA) 的性能造成问题。我们进一步研究这种不平衡与数据碎片和噪声的关系。我们表明，即使是极端的数据不平衡，如果不与碎片或噪声相结合，也不会对 CNA 的性能产生负面影响。然而，对模糊集数据进行的一系列扩展模拟实验表明，当与碎片或噪声混合时，数据不平衡可能会严重损害 CNA 的性能。此外，我们发现内生因素失衡时的绩效损害比外生因素失衡时更高。我们的结果使我们能够量化这些影响并划分数据不平衡应被视为问题的程度。因此，应用研究人员可以使用我们的划界指南来提高他们研究的有效性。

更新日期：2024-03-19

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>