Implementation of a System for Removing Noisy Hyperlinks: A Semantic and Relatedness-Based Approach

Document Type : Original Article


Faculty of Electrical and Computer Engineering, Hakim Sabzevari University, Sabzevar, Iran.


As the amount of data on the web increases, the web structure graph, which represents the web as a graph, is also evolving. The structure of this graph has shifted from being based on content to being non-content-based. Additionally, spam data, such as noisy hyperlinks, in the web structure graph can negatively impact the speed and efficiency of information retrieval and link mining algorithms. Previous research in this field has concentrated on eliminating noisy hyperlinks through structural and string-based methods. However, these methods may mistakenly eliminate valuable links or fail to identify noisy hyperlinks in certain situations. In this paper, we begin by constructing a data collection of hyperlinks using an interactive crawler. We then examine the semantic and relatedness structure of the hyperlinks using semantic web tools such as the DBpedia ontology. The removal process of noisy hyperlinks is performed using a reasoner on the DBpedia ontology. Our experiments demonstrate the accuracy and effectiveness of semantic web technologies in eliminating noisy hyperlinks.