An Efficient Workflow Towards Improving Classifiers in Low-Resource Settings with Synthetic Data

Research output: Other contributionResearchpeer-review

Abstract

The correct classification of the 17 Sustainable Development Goals (SDG) proposed by the United Nations (UN) is still a challenging and compelling prospect due to the Shared Task’s imbalanced dataset. This paper presents a good method to create a baseline using RoBERTa and data augmentation that offers a good overall performance on this imbalanced dataset. What is interesting to notice is that even though the alignment between synthetic gold and real gold was only marginally better than what would be expected by chance alone, the final scores were still okay.
Original languageEnglish
Publication statusPublished - 11 Jun 2024

Fingerprint

Dive into the research topics of 'An Efficient Workflow Towards Improving Classifiers in Low-Resource Settings with Synthetic Data'. Together they form a unique fingerprint.

Cite this