Are Word Embedding Methods Stable and Should We Care About It?

Published in ACM HyperText, 2021

Recommended citation: Angana Borah, Manash Pratim Barman, Amit C Awekar. (2021). "Are Word Embedding Methods Stable and Should We Care About It?" In Proceedings of the 32nd ACM Conference on Hypertext and Social Media . (pp. 45-55).

Download paper here

The central idea of this paper is to explore the stability measurement of WEMs using intrinsic evaluation based on word similarity. We experiment with three popular WEMs: Word2Vec, GloVe, and fastText. For stability measurement, we investigate the effect of five parameters involved in training these models. We perform experiments using four real-world datasets from different domains: Wikipedia, News, Song lyrics, and European parliament proceedings. We also observe the effect of WEM stability on two downstream tasks: Clustering and Fairness evaluation. Our experiments indicate that amongst the three WEMs, fastText is the most stable, followed by GloVe and Word2Vec.