官术网_书友最值得收藏!

Distributed feature representation

A distributed representation is dense, whereas each of the learned concepts is represented by multiple neurons simultaneously, and each neuron represents more than one concept. In other words, input data is represented on multiple, interdependent layers, each describing data at different levels of scale or abstraction. Therefore, the representation is distributed across various layers and multiple neurons. In this way, two types of information are captured by the network topology. On the one hand, for each neuron, it must represent something, so this becomes a local representation. On the other hand, so-called distribution means a map of the graph is built through the topology, and there exists a many-to-many relationship between these local representations. Such connections capture the interaction and mutual relationship when using local concepts and neurons to represent the whole. Such representation has the potential to capture exponentially more variations than local ones with the same number of free parameters. In other words, they can generalize non-locally to unseen regions. They hence offer the potential for better generalization because learning theory shows that the number of examples needed (to achieve the desired degree of generalization performance) to tune O (B) effective degrees of freedom is O (B). This is referred to as the power of distributed representation as compared to local representation (http://www.iro.umontreal.ca/~pift6266/H10/notes/mlintro.html).

An easy way to understand the example is as follows. Suppose we need to represent three words, one can use the traditional one-hot encoding (length N), which is commonly used in NLP. Then at most, we can represent N words. The localist models are very inefficient whenever the data has componential structure:

One-hot encoding

A distributed representation of a set of shapes would look like this:

Distributed representation 

If we wanted to represent a new shape with a sparse representation, such as one-hot-encoding, we would have to increase the dimensionality. But what’s nice about a distributed representation is we may be able to represent a new shape with the existing dimensionality. An example using the previous example is as follows:

Representing new concepts using distributed representation

Therefore, non-mutually exclusive features/attributes create a combinatorially large set of distinguishable configurations and the number of distinguishable regions grows almost exponentially with the number of parameters.

One more concept we need to clarify is the difference between distributed and distributional. Distributed is represented as continuous activation levels in a number of elements, for example, a dense word embedding, as opposed to one-hot encoding vectors.

On the other hand, distributional is represented by contexts of use. For example, Word2Vec is distributional, but so are count-based word vectors, as we use the contexts of the word to model the meaning.

主站蜘蛛池模板: 赤水市| 大埔县| 砚山县| 响水县| 镶黄旗| 上杭县| 天祝| 台湾省| 汝阳县| 奎屯市| 江川县| 肇东市| 濉溪县| 钦州市| 博湖县| 苍山县| 恩平市| 安龙县| 巴林右旗| 从江县| 合水县| 嘉义市| 平顺县| 林口县| 慈溪市| 南平市| 汉源县| 临澧县| 嵊州市| 长丰县| 陵川县| 霸州市| 扶余县| 巴中市| 兴山县| 巨鹿县| 浑源县| 江华| 武鸣县| 昌平区| 湖北省|