[ 对比学习篇 ] 经典网络模型 —— Contrastive Learning-编程实验室

🤵Author：Horizon John

✨编程技巧篇：各种操作小结

🏆神经网络篇：经典网络模型

💻算法篇：再忙也别忘了 LeetCode

[ 对比学习篇 ] 经典网络模型 —— Contrastive Learning

🚀 01. InstDisc
- 🎨 结构框图
- - 🚩 详解
  - 🚩 效果
🚀 02. InvaSpread
- 🎨 结构框图
- - 🚩 详解
  - 🚩 效果
🚀 03. CPC
- 🎨 结构框图
- - 🚩 详解
  - 🚩 效果
🚀 04. CMC
- 🎨 结构框图
- - 🚩 详解
🚀 05. MoCov1
- 🎨 结构框图
- - 🚩 详解
  - 🚩 效果
🚀 06. SimCLRv1
- 🎨 结构框图
- - 🚩 详解
  - 🚩 效果
🚀 07. Mocov2
- 🎨 结构框图
- - 🚩 详解
  - 🚩 效果
🚀 08. SimCLRv2
- 🎨 结构框图
- - 🚩 详解
  - 🚩 效果
🚀 09. SWaV
- 🎨 结构框图
- - 🚩 详解
  - 🚩 效果
🚀 10. BYOL
- 🎨 结构框图
- - 🚩 详解
  - 🚩 效果
🚀 11. SimSiam
- 🎨 结构框图
- - 🚩 详解
  - 🚩 效果
🚀 12. Mocov3
- 🎨 结构框图
- - 🚩 详解
  - 🚩 效果
🚀 13. DINO
- 🎨 结构框图
- - 🚩 详解
  - 🚩 效果
🚀 14. CLIP
- 🎨 结构框图
- - 🚩 详解
  - 🚩 效果

🚀 01. InstDisc

📜 Paper: Unsupervised Feature Learning via Non-Parametric Instance Discrimination[CVPR 2018]

🖥️ GitHub: lemniscate.pytorch

🎨 结构框图

Figure 1. The pipeline

🚩 详解

每个图片看成一个类别；
利用memory bank来存储图像经神经网络编码后的特征（128维）；
正样本：该图像本身 + 经过数据增强后的图像；
负样本：数据集中其他的图像（从 memory bank 中随机抽取4096个样本）；

超参数设定：

temperatureτ = 0.07；
NCE with m = 4, 096 to balance performance and computing cost；
trained for 200 epochs using SGD with momentum；
batch size = 256；
learning rate is initialized to 0.03, scaled down with coefficient 0.1 every 40 epochs after the first 120 epochs；

🚩 效果

🚀 02. InvaSpread

📜 Paper: Unsupervised Embedding Learning via Invariant and Spreading Instance Feature[CVPR 2019]

🖥️ GitHub: Unsupervised_Embedding_Learning

🎨 结构框图

Figure 1. The framework

🚩 详解

没有使用额外的数据结构去存储大量的样本信息；
正负样本都来自于同一个minibatch；
使用同一个编码器进行端到端的学习；
正样本：该图像本身 + 经过数据增强后的图像（2）；
负样本：其他图像 + 经过数据增强后的图像（(batch size-1) × 2）；
未能取得很好结果原因：batch size 太小导致负样本数量较小；

🚩 效果

🚀 03. CPC

📜 Paper: Representation Learning with Contrastive Predictive Coding[None 2018]

🖥️ GitHub: None

🎨 结构框图

Figure 1. Model overview

🚩 详解

可以应用于音频、图片、强化学习；
将输入当成序列，利用前面的输入通过 RNN 或 LSTM 等网络输出来进行预测；
正样本：预测结果；
负样本：随机样本通过 g_enc得到的结果；

🚩 效果

🚀 04. CMC

📜 Paper: Contrastive Multiview Coding[ECCV 2020]

🖥️ GitHub: CMC

🎨 结构框图

Figure 1. Model overview

🚩 详解

增大不同视角之间的互信息（视觉、听觉、触觉）；
数据集：NYU RGBD，包含原始图像、深度信息、SwAV ace normal、分割图像；
正样本：同一图像的不同视角；
负样本：其他图像；
缺点：不同视角下使用的编码器不一样，计算成本过高；

作者后来又提出了不同网络得到的特征也应该尽可能相似；
利用蒸馏网络（teacher net & student net）；

🚀 05. MoCov1

📜 Paper: Momentum Contrast for Unsupervised Visual Representation Learning[CVPR 2020]

🖥️ GitHub: moco

🎨 结构框图

Figure 1. Model overview

Figure 2. Conceptual comparison of three contrastive loss mechanisms

🚩 详解

InstDisc 的改进；
提出了队列（queue）来解决 memory bank 中的大字典的问题；
提出了动量编码器来解决字典中特征不一致的问题；
利用动态字典对队列中的特征进行存储，每一次更新得到的 k 都会取代最开始的 k 值；
动量编码器：y_t= m·y_t-1+ (1-m)·x_t使输出不完全依赖于当前的输入，还会收到上一个输出的影响，0 ≤ m ≤ 1 ，实现缓慢的更新每一次新的到的 k 值，使字典中的特征尽可能的保持一致；
正负样本都位于队列当中，确保正负样本都是由同一个编码器提取得到的；

🚩 效果

🚀 06. SimCLRv1

📜 Paper: A Simple Framework for Contrastive Learning of Visual Representations[ICML 2020]

🖥️ GitHub: simclr

🎨 结构框图

Figure 1. The framework

🚩 详解

增大数据增强的数量；
编码后的特征再经过一个g(·) 函数（MLP层）再求 loss 值，实现更好的训练特征编码器；
设置了更大的 batchsize；
训练时间更久；

🚩 效果

数据增强策略及效果

🚀 07. Mocov2

📜 Paper: Improved Baselines with Momentum Contrastive Learning[None 2020]

🖥️ GitHub: None

🎨 结构框图

Figure 1. A batching perspective of two optimization mechanisms for contrastive learning

🚩 详解

借鉴 SimCLRv1 的策略，添加了MLP层，使用了数据增强，训练时使用 cos learning rate schedule，训练更多epochs；

🚩 效果

🚀 08. SimCLRv2

📜 Paper: Big Self-Supervised Models are Strong Semi-Supervised Learners[NeurIPS 2020]

🖥️ GitHub: simclr

🎨 结构框图

Figure 1. The framework

🚩 详解

使用更大的骨干网络模型；
增加 MLP层，实验测试两层最佳；
使用动量编码器，参考 MoCo；

🚩 效果

🚀 09. SWaV

📜 Paper: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments[NeurIPS 2020]

🖥️ GitHub: swav

🎨 结构框图

Figure 1. Model overview

🚩 详解

生成多个视角，利用一个视角得到的特征去预测另一个视角的特征；
与聚类工作相结合，利用聚类中心（3000个）进行预测；
cz₁预测 Q₂， cz₂预测 Q₁，c 为聚类中心，z₁和 z₂为提取的特征编码；
采用聚类中心：可以降低采样的负样本数量从而减少计算成本，解决正样本也纳入到负样本中导致的样本不均衡的问题；
提出Muti-crop的数据增强策略，多尺度的剪裁原始图像作为数据增强；

🚩 效果

🚀 10. BYOL

📜 Paper: Bootstrap Your Own Latent A New Approach to Self-Supervised Learning[NeurIPS 2020]

📜 Blog: Understanding Self-Supervised and Contrastive Learning with “Bootstrap Your Own Latent” (BYOL)

🖥️ GitHub: byol

🎨 结构框图

Figure 1. BYOL’s architecture

🚩 详解

f_θ和 f_ξ的网络结构一样，模型参数更新不同，f_ξ采用动量编码器更新；
g_θ和 g_ξ是类似 SimCLR 的MLP层，与 f_θ和 f_ξ的更新策略一样；
在模型最后输出部分 z_θ再经过一个 MLP层得到 q_θ(z_θ)，利用 q_θ(z_θ) 预测 z’_ξ计算 loss ；
模型测试阶段使用 y_θ作为输出；

🚩 效果

🚀 11. SimSiam

📜 Paper: Exploring Simple Siamese Representation Learning[CVPR 2021]

🖥️ GitHub: simsiam

🎨 结构框图

Figure 1. SimSiam architecture

Figure 2. Comparison on Siamese architectures

🚩 详解

较 BYOL 没有使用动量编码器进行参数更新；
总结性工作；

🚩 效果

Comparisons on ImageNet linear classification

Transfer Learning

🚀 12. Mocov3

📜 Paper: An Empirical Study of Training Self-Supervised Vision Transformers[ICCV 2021, Oral]

🖥️ GitHub: moco-v3

🎨 结构框图

Figure 1. Algorithm

🚩 详解

结合 MoCov2 和 SimSiam ；
骨干网络替换成了 ViT ；

🚩 效果

🚀 13. DINO

📜 Paper: Emerging Properties in Self-Supervised Vision Transformers[ICCV 2021]

🖥️ GitHub: dino

🎨 结构框图

Figure 1. Model overview
Figure 1. Algorithm

🚩 详解

融合 ViT 模型；
使用 student g_θs得到的结果 P₁去预测 teacher g_θt得到的结果 P₂；

🚩 效果

Self-attention from a Vision Transformer with 8 × 8 patches trained with no supervision

参考：

对比学习论文综述【论文精读】

🚀 14. CLIP

📜 Paper: Learning Transferable Visual Models From Natural Language Supervision[None 2021]

🖥️ GitHub: CLIP

🎨 结构框图

Figure 1. Summary of the approach

Numpy-like pseudocode

🚩 详解

模型训练采用了一个非常大的数据集：400 million 图片文字对（image，text）；
CLIP 预训练模型可以在不需要任何数据集训练的情况下和一个有监督学习的模型达成平手，甚至还会更高（ImageNet，ResNet）；
正样本：对角线元素（I_iT_i(1 ≤ i ≤ N)）;
负样本：除对角线元素外的其他元素；

🚩 效果

CLIP is much more efficient at zero-shot transfer than our image caption baseline

Linear probe performance of CLIP models in comparison with state-of-the-art computer vision models

[ 对比学习篇 ] 经典网络模型 —— Contrastive Learning

🚀 01. InstDisc

🎨 结构框图

🚩 详解

🚩 效果

🚀 02. InvaSpread

🎨 结构框图

🚩 详解

🚩 效果

🚀 03. CPC

🎨 结构框图

🚩 详解

🚩 效果

🚀 04. CMC

🎨 结构框图

🚩 详解

🚀 05. MoCov1

🎨 结构框图

🚩 详解

🚩 效果

🚀 06. SimCLRv1

🎨 结构框图

🚩 详解

🚩 效果

🚀 07. Mocov2

🎨 结构框图

🚩 详解

🚩 效果

🚀 08. SimCLRv2

🎨 结构框图

🚩 详解

🚩 效果

🚀 09. SWaV

🎨 结构框图

🚩 详解

🚩 效果

🚀 10. BYOL

🎨 结构框图

🚩 详解

🚩 效果

🚀 11. SimSiam

🎨 结构框图

🚩 详解

🚩 效果

🚀 12. Mocov3

🎨 结构框图

🚩 详解

🚩 效果

🚀 13. DINO

🎨 结构框图

🚩 详解

🚩 效果

🚀 14. CLIP

🎨 结构框图

🚩 详解

🚩 效果

（八）【JVS-APS智能排产】：智能APS-工序模版

Rescuezilla终极指南：3步解决系统崩溃的完整方案

让AI自己用电脑！Cua：后台操作鼠标键盘，Mac/Windows/Linux全支持

从零基础到拿下亚马逊offer：这份“编程面试大学”学习计划，我用了8个月

终极指南：如何用本地AI工具一键提取视频硬字幕，免费生成SRT文件

3步开启你的二次元音乐之旅：MoeKoe音乐播放器完全体验指南