Identifying hidden semantic structures in Instagram data: A topic modelling comparison

Roman Egger, Joanne Yu

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Purpose – Intrigued by the methodological challenges emerging from text complexity, the purpose of this study is to evaluate the effectiveness of different topic modelling algorithms based on Instagram textual data.
Design/methodology/approach – By taking Instagram posts captioned with #darktourism as the study context, this research applies latent Dirichlet allocation (LDA), correlation explanation (CorEx), and nonnegative matrix factorisation (NMF) to uncover tourist experiences.
Findings – CorEx outperforms LDA and NMF by classifying emerging dark sites and activities into 17 distinct topics. The results of LDA appear homogeneous and overlapping, whereas the extracted topics of NMF are not specific enough to gain deep insights.
Originality/value – This study assesses different topic modelling algorithms for knowledge extraction in the highly heterogeneous tourism industry. The findings unfold the complexity of analysing short-text social media data and strengthen the use of CorEx in analysing Instagram content.
Original languageEnglish
Pages (from-to)1-13
Number of pages13
JournalTourism Review
DOIs
Publication statusPublished - Oct 2021

Keywords

  • Instagram
  • Machine learning
  • LDA
  • Topic modelling
  • CorEx
  • NMF

Fingerprint

Dive into the research topics of 'Identifying hidden semantic structures in Instagram data: A topic modelling comparison'. Together they form a unique fingerprint.

Cite this