官术网_书友最值得收藏!

Distributed feature representation

A distributed representation is dense, whereas each of the learned concepts is represented by multiple neurons simultaneously, and each neuron represents more than one concept. In other words, input data is represented on multiple, interdependent layers, each describing data at different levels of scale or abstraction. Therefore, the representation is distributed across various layers and multiple neurons. In this way, two types of information are captured by the network topology. On the one hand, for each neuron, it must represent something, so this becomes a local representation. On the other hand, so-called distribution means a map of the graph is built through the topology, and there exists a many-to-many relationship between these local representations. Such connections capture the interaction and mutual relationship when using local concepts and neurons to represent the whole. Such representation has the potential to capture exponentially more variations than local ones with the same number of free parameters. In other words, they can generalize non-locally to unseen regions. They hence offer the potential for better generalization because learning theory shows that the number of examples needed (to achieve the desired degree of generalization performance) to tune O (B) effective degrees of freedom is O (B). This is referred to as the power of distributed representation as compared to local representation (http://www.iro.umontreal.ca/~pift6266/H10/notes/mlintro.html).

An easy way to understand the example is as follows. Suppose we need to represent three words, one can use the traditional one-hot encoding (length N), which is commonly used in NLP. Then at most, we can represent N words. The localist models are very inefficient whenever the data has componential structure:

One-hot encoding

A distributed representation of a set of shapes would look like this:

Distributed representation 

If we wanted to represent a new shape with a sparse representation, such as one-hot-encoding, we would have to increase the dimensionality. But what’s nice about a distributed representation is we may be able to represent a new shape with the existing dimensionality. An example using the previous example is as follows:

Representing new concepts using distributed representation

Therefore, non-mutually exclusive features/attributes create a combinatorially large set of distinguishable configurations and the number of distinguishable regions grows almost exponentially with the number of parameters.

One more concept we need to clarify is the difference between distributed and distributional. Distributed is represented as continuous activation levels in a number of elements, for example, a dense word embedding, as opposed to one-hot encoding vectors.

On the other hand, distributional is represented by contexts of use. For example, Word2Vec is distributional, but so are count-based word vectors, as we use the contexts of the word to model the meaning.

主站蜘蛛池模板: 泰来县| 松原市| 中方县| 临湘市| 永嘉县| 温州市| 荔波县| 上高县| 黄陵县| 长沙县| 高雄市| 德令哈市| 新化县| 高台县| 虎林市| 吴堡县| 当涂县| 利川市| 绥中县| 普兰县| 建昌县| 黄浦区| 江孜县| 柏乡县| 新乐市| 鹰潭市| 始兴县| 通许县| 衡阳市| 桦川县| 凌海市| 沽源县| 太谷县| 宿州市| 象山县| 涿州市| 和平区| 银川市| 宣恩县| 如皋市| 多伦县|