Abstract
The correct classification of the 17 Sustainable Development Goals (SDG) proposed by the United Nations (UN) is still a challenging and compelling prospect due to the Shared Task’s imbalanced dataset. This paper presents a good method to create a baseline using RoBERTa and data augmentation that offers a good overall performance on this imbalanced dataset. What is interesting to notice is that even though the alignment between synthetic gold and real gold was only marginally better than what would be expected by chance alone, the final scores were still okay.
Original language | English |
---|---|
Publication status | Published - 11 Jun 2024 |