- Hands-On Meta Learning with Python
- Sudharsan Ravichandiran
- 462字
- 2021-07-02 14:29:18
Architecture of siamese networks
Now that we have a basic understanding of siamese networks, we will explore them in detail. The architecture of a siamese network is shown in the following diagram:

As you can see in the preceding diagram, a siamese network consists of two identical networks both sharing the same weights and architecture. Let's say we have two inputs, X1 and X2. We feed our input X1 to Network A, that is, fw(X1), and we feed our input X2 to Network B, that is, fw(X2). As you will notice, both of these networks have the same weights, w, and they will generate embeddings for our input, X1 and X2. Then, we feed these embeddings to the energy function, E, which will give us similarity between the two inputs.
It can be expressed as follows:

Let's say we use Euclidean distance as our energy function, then the value of E will be less, if X1 and X2 are similar. The value of E will be large if the input values are dissimilar.
Assume that you have two sentences, sentence 1 and sentence 2. We feed sentence 1 to Network A and sentence 2 to Network B. Let's say both our Network A and Network B are LSTM networks and they share the same weights. So, Network A and Network B will generate the word embeddings for sentence 1 and sentence 2 respectively. Then, we feed these embeddings to the energy function, which gives us the similarity score between the two sentences. But how can we train our siamese networks? How should the data be? What are the features and labels? What is our objective function?
The input to the siamese networks should be in pairs, (X1, X2), along with their binary label, Y ∈ (0, 1), stating whether the input pairs are a genuine pair (same) or an imposite pair (different). As you can see in the following table, we have sentences as pairs and the label implies whether the sentence pairs are genuine (1) or imposite (0):

So, what is the loss function of our siamese network? Since the goal of the siamese network is not to perform a classification task but to understand the similarity between the two input values, we use the contrastive loss function.
It can be expressed as follows:

In the preceding equation, the value of Y is the true label, which will be 1 when the two input values are similar and 0 if the two input values are dissimilar, and E is our energy function, which can be any distance measure. The term margin is used to hold the constraint, that is, when two input values are dissimilar, and if their distance is greater than a margin, then they do not incur a loss.
- Google Visualization API Essentials
- DB29forLinux,UNIX,Windows數(shù)據(jù)庫管理認(rèn)證指南
- Visual Studio 2015 Cookbook(Second Edition)
- Python金融大數(shù)據(jù)分析(第2版)
- 大數(shù)據(jù)時(shí)代下的智能轉(zhuǎn)型進(jìn)程精選(套裝共10冊(cè))
- Dependency Injection with AngularJS
- Power BI商業(yè)數(shù)據(jù)分析完全自學(xué)教程
- 企業(yè)級(jí)容器云架構(gòu)開發(fā)指南
- INSTANT Android Fragmentation Management How-to
- 淘寶、天貓電商數(shù)據(jù)分析與挖掘?qū)崙?zhàn)(第2版)
- Google Cloud Platform for Developers
- 數(shù)據(jù)分析師養(yǎng)成寶典
- Access數(shù)據(jù)庫開發(fā)從入門到精通
- Scratch 2.0 Game Development HOTSHOT
- PostgreSQL高可用實(shí)戰(zhàn)