emoji2vec: Emoji to vector

The techniques that created word2vec don’t stop with words: there’s a derived doc2vec that works on larger blocks of text, for example. But the topic for today is training the algorithm on emoji. 

This research by Ben Eisner, Tim Rocktaschel, Isabelle Augenstein, Matko Bosnjak, and Sebastian Riedel trained on the emoji descriptions from the Unicode standard. This turns out to be sufficient to get a good result, and it means that you don’t have to sample millions of tweets to get a dataset with enough emoji.

Here’s a map of the emoji-vector-space:

image

https://arxiv.org/pdf/1609.08359v1.pdf