Mining Large Samples of Web-Based Corpora

Arno Scharl, Christian Bauer

Research output: Contribution to journalArticleResearchpeer-review

Abstract

This paper presents a method to automatically mirror, process, and compare large samples of text corpora from Web-based information systems. The wealth of textual information contained in publicly available Web sites is converted into aggregated representations through textual analysis. The application of word lists, keyword analysis, term clustering, and correspondence analyses to identify and represent semantic relationships, including their longitudinal patterns, is illustrated through a case study that investigates the global coverage of solar power technologies in international media. The resulting graphs, indicators and tables describe complex relationships and developments that are hard to capture in traditional ways. As such they facilitate investigations about the nature and dynamics of Web content.
Original languageEnglish
Pages (from-to)229-233
Number of pages5
JournalKnowledge-Based Systems
Volume17
Issue number5-6
DOIs
Publication statusPublished - Aug 2004

Keywords

  • Web mining; Content analysis; Renewable Energy; Online media

Fingerprint

Dive into the research topics of 'Mining Large Samples of Web-Based Corpora'. Together they form a unique fingerprint.

Cite this