摘要:
本文基于天網實驗室的Platform for Applying, Researching And Developing Intelligent Search Engine (PARADISE)搜索引擎平臺,通過以從portal.acm.org抓取的計算機網絡方向的2500多篇論文為數據,搭建成一個論文搜索系統,最終目的是通過論文之間的引用關系,獲得其他引用這篇論文的作者對這篇論文的評價,形成一個小的評價段落,以及Impact-based Summaries,從而使得我們能夠從專業級的角度獲得這篇論文的內容以及優劣。我們首先從portal.acm.org上面抓取了文章之間的引用關系,然后通過一個算法獲得對一篇文章評價的候選句子集,根據這些句子的重要程度進行排序,獲得一個評價短文。并且構建了一個語言模型,通過這些候選句子集對原文的句子進行評分,取得分最高的幾個句子,獲得原文基于影響的概括。
關鍵詞
搜索引擎, 論文評價, 語言模型, KL-divergence算法, 基于影響的概括
Abstract
In this paper, based on the PARADISE (Platform for Applying, Researching and Developing Intelligent Search Engine) and the data of 2500 papers in area of computer network, we construct a search engine of papers. Our goal is to get the comment and impact-based summaries of one paper based on the reference relations between the papers. We firstly get candidate sentences which comment on the previous paper and generate a citation context. Then we construct a Language Model, through the citation context, we can score the sentence in the previous paper, and get the impact-based summaries.
Key words
Search Engine, Paper Comment, Language Model, KL-divergence Scoring, Impact-based Summaries