Efficient Query Processing for Scalable Web Search,Foundations and Trends in Information Retrieval

当前位置： X-MOL 学术 › Found. Trends Inf. Ret. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Efficient Query Processing for Scalable Web Search
Foundations and Trends in Information Retrieval ( IF 10.4 ) Pub Date : 2018-12-22 , DOI: 10.1561/1500000057
Nicola Tonellotto , Craig Macdonald , Iadh Ounis

Search engines are exceptionally important tools for accessing information in today’s world. In satisfying the information needs of millions of users, the effectiveness (the quality of the search results) and the efficiency (the speed at which the results are returned to the users) of a search engine are two goals that form a natural trade-off, as techniques that improve the effectiveness of the search engine can also make it less efficient. Meanwhile, search engines continue to rapidly evolve, with larger indexes, more complex retrieval strategies and growing query volumes. Hence, there is a need for the development of efficient query processing infrastructures that make appropriate sacrifices in effectiveness in order to make gains in efficiency. This survey comprehensively reviews the foundations of search engines, from index layouts to basic term-at-a-time (TAAT) and document-at-a-time (DAAT) query processing strategies, while also providing the latest trends in the literature in efficient query processing, including the coherent and systematic reviews of techniques such as dynamic pruning and impact-sorted posting lists as well as their variants and optimisations. Our explanations of query processing strategies, for instance the WAND and BMW dynamic pruning algorithms, are presented with illustrative figures showing how the processing state changes as the algorithms progress. Moreover, acknowledging the recent trends in applying a cascading infrastructure within search systems, this survey describes techniques for efficiently integrating effective learned models, such as those obtained from learning-to-rank techniques. The survey also covers the selective application of query processing techniques, often achieved by predicting the response times of the search engine (known as query efficiency prediction), and making per-query tradeoffs between efficiency and effectiveness to ensure that the required retrieval speed targets can be met. Finally, the survey concludes with a summary of open directions in efficient search infrastructures, namely the use of signatures, real-time, energy-efficient and modern hardware and software architectures.

中文翻译：

可扩展Web搜索的高效查询处理

搜索引擎是当今访问信息的极其重要的工具。为了满足数百万用户的信息需求，搜索引擎的有效性（搜索结果的质量）和效率（将结果返回给用户的速度）是形成自然折衷的两个目标。，因为提高搜索引擎效率的技术也会使其效率降低。同时，搜索引擎随着索引的增加，更复杂的检索策略以及不断增长的查询量而继续快速发展。因此，需要开发有效的查询处理基础结构，其在有效性上做出适当的牺牲以提高效率。这项调查全面回顾了搜索引擎的基础，从索引布局到基本一次性（TAAT）和一次性文档（DAAT）查询处理策略，同时还提供了有效查询处理方面文献的最新趋势，包括连贯和系统的评论动态修剪和按影响排序的发布列表以及它们的变体和优化之类的技术。给出了我们对查询处理策略的解释，例如WAND和BMW动态修剪算法，并附有说明性数字，显示了处理状态如何随着算法的发展而变化。此外，该调查认识到在搜索系统中应用级联基础结构的最新趋势，因此描述了有效集成有效学习模型的技术，例如从按等级学习的技术获得的模型。该调查还涵盖了查询处理技术的选择性应用，通常是通过预测搜索引擎的响应时间（称为查询效率预测）并在效率和有效性之间进行每次查询折衷来确保所需的检索速度目标可以实现的。被满足。最后，调查总结了有效搜索基础架构中的开放方向，即使用签名，实时，节能和现代硬件和软件架构。

更新日期：2018-12-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>