Abstract
The World Wide Web as a social network reflects changes of interest in certain domains. It has been shown that free online content available through blogs, wikis, news media and online forums is a valuable source of information to identify trends in certain domains. Utilizing this data, one can construct ontologies that describe this information and provide a semantically correct overview of a domain. Tracked over time this also enables a user to identify trends and hypes. The decentralised structure of the Internet, the huge amount of data and upcoming Web2.0 technologies pose several challenges to a crawling system for ontology learning, evolution and trend analysis. This paper presents a distributed crawling system with browser integration for Web2.0. The proposed crawler is a high performance Web data retrieval system aimed to gather browser-equivalent textual Web content and prepare it for ontology learning.
Original language | English |
---|---|
Pages (from-to) | 114-119 |
Number of pages | 6 |
Journal | Journal of Digital Information Management |
Volume | 7 |
Issue number | 2 |
Publication status | Published - 2009 |