官术网_书友最值得收藏!

Chapter 3. Word2vec – Learning Word Embeddings

In this chapter, we will discuss a topic of paramount importance in NLP—Word2vec, a technique to learn word embeddings or distributed numerical feature representations (that is, vectors) of words. Learning word representations lies at the very foundation of many NLP tasks because many NLP tasks rely on good feature representations for words that preserve their semantics as well as their context in a language. For example, the feature representation of the word forest should be very different from oven as these words are rarely used in similar contexts, whereas the representations of forest and jungle should be very similar.

Note

Word2vec is called a distributed representation, as the semantics of the word is captured by the activation pattern of the full representation vector, in contrast to a single element of the representation vector (for example, setting a single element in the vector to 1 and rest to 0 for a single word).

We will go step by step from the classical approach to solving this problem to modern neural network-based methods that deliver state-of-the-art performance in finding good word representations. We visualize (using t-SNE, a visualization technique for high-dimensional data) such learned word embeddings for a set of words on a 2D canvas in Figure 3.1. If you take a closer look, you will see that similar things are placed close to each other (for example, numbers in the cluster in the middle):

Figure 3.1: An example visualization of learned word embeddings using t-SNE

Note

t-Distributed Stochastic Neighbor Embedding (t-SNE)

This is a dimensionality reduction technique that projects high-dimensional data to a two-dimensional space. This allows us to imagine how high-dimensional data is distributed in space, and it is quite useful as we cannot visualize beyond three dimensions easily. You will learn about t-SNE in more detail in the next chapter.

主站蜘蛛池模板: 大城县| 道真| 阿图什市| 大连市| 炉霍县| 淳安县| 林芝县| 栾城县| 南乐县| 芦山县| 连平县| 毕节市| 甘谷县| 黄大仙区| 吉木萨尔县| 长丰县| 远安县| 建瓯市| 南和县| 滨海县| 茌平县| 鄱阳县| 鹤岗市| 乌鲁木齐县| 茌平县| 阿尔山市| 林周县| 永兴县| 本溪市| 绵阳市| 辉南县| 雅江县| 宣化县| 平果县| 铁岭市| 东安县| 鄂托克旗| 东源县| 绥江县| 开远市| 盐山县|