教師資料查詢 | 類別: 會議論文 | 教師: 周清江 Chichang Jou (瀏覽個人網頁)

標題:Heuristics-Based Schema Extraction for Deep Web Query Interfaces
學年106
學期1
發表日期2017/08/04
作品名稱Heuristics-Based Schema Extraction for Deep Web Query Interfaces
作品名稱(其他語言)
著者Chichang Jou; Yucheng Cheng
作品所屬單位
出版者
會議名稱The 2017 IEEE International Conference on Information Reuse and Integration
會議地點San Diego, CA, USA
摘要Along with the fast popularity of the internet, contents
inside web databases also increase quickly. These data,
hidden behind the query interfaces, are called Deep Web. Volumes of deep web contents were estimated to be around 500 times those of surface web. In order to obtain the dynamic contents which satisfy the conditions imposed by the elements of the interface, the internet users must fill in valid values. This is the reason why these contents are not collected by the search engines. Many deep web contents related applications, like contents collection, topic-focused crawling, and data integration, are based on understanding the schema of these query interfaces. The schema needs to cover mappings of input elements and labels, data types of valid input values, and range constraints of the input values, etc. We propose a Heuristics-based deep web query interface Schema Extraction system (HSE) that identifies labels, elements, mappings among labels and elements, and relationships among elements. In HSE, Texts surrounding elements are collected as candidate labels.
We propose a string similarity definition and dynamic
similarity threshold setup to cleanse or modify candidate labels. Elements, candidate labels, and new lines in the query interface are streamlined to produce its Interface Expression (IEXP). By combining the users' view and the designer’s view, with the aid of semantic information, we then build heuristic rules to extract schema from IEXP of query interfaces in the ICQ dataset. These rules are constructed through utilizing (1) the characteristics of labels and elements, and (2) the spatial, group, and range relationships of labels and elements. Our schema not only helps extracting contents of the deep web, but also benefits the processes of schema matching and schema merging. The experimental results on the TEL-8 dataset show that HSE produces effective performance.
關鍵字Deep Web, Query Interface, Schema Extraction, Heuristic Rules, String Similarity
語言英文(美國)
收錄於
會議性質國際
校內研討會地點
研討會時間20170804~20170806
通訊作者
國別美國
公開徵稿
出版型式
出處
相關連結
Google+ 推薦功能,讓全世界都能看到您的推薦!