当前位置：首页 > news >正文

[MIA 2025]CLIP in medical imaging: A survey

news 2025/7/3 8:06:58

论文网址：CLIP in medical imaging: A survey - ScienceDirect

项目页面：github.com

英文是纯手打的！论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误，若有发现欢迎评论指正！文章偏向于笔记，谨慎食用

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Background

2.3.1. Contrastive language-image pre-training

2.3.2. Variants of CLIP

2.3.3. Medical image–text dataset

2.4. CLIP in medical image–text pre-training

2.4.1. Challenges of CLIP pre-training

2.4.2. Multi-scale contrast

2.4.3. Data-efficient contrast

2.4.4. Explicit knowledge enhancement

2.4.5. Others

2.4.6. Summary

2.5. CLIP-driven applications

2.5.1. Classification

2.5.2. Dense prediction

2.5.4. Summary

2.6. Comparative analysis

2.7. Discussions and future directions

2.8. Conclusion

1. 心得

（1）我这可能只记录这篇文章比较不同的地方，基础CLIP和医学影像就不记录了，可以参考原文。主要是太长了没必要全搬运

（2）怎么全文画图风格还不一样，每个人画一张拼的？

（3）偏记录一点，介绍了不同的特别多模型

2. 论文逐段精读

2.1. Abstract

①就说CLIP在医学成像领域有意义然后要探索一下

2.2. Introduction

①Limitations: poor performance on out-of-distribution performance

②The trend of CLIP relevant papers (left) and medical image contained in thosed papers (right):

③How CLIP be used:

2.3. Background

2.3.1. Contrastive language-image pre-training

①How CLIP works（如果没看过可以去找CLIP原文，很清晰易懂的）:

②Performance of CLIP in medical field:

2.3.2. Variants of CLIP

①介绍了一些变体，但因为没画图很难记住或者一眼知道有啥区别

2.3.3. Medical image–text dataset

①Open medical dataset:

2.4. CLIP in medical image–text pre-training

①Representative CILP based medical models:

2.4.1. Challenges of CLIP pre-training

①Challenges of CLIP in medical image field:

Modality-influenced, local and global image/text analysis needed

Scarse data（不是说零样本泛化性都很好了吗为什么又说数据稀缺

Need professional kownledge

2.4.2. Multi-scale contrast

①GLoRIA matches text with subgraph:

②LoVT further assigns different weights on different sentence

2.4.3. Data-efficient contrast

①Blindly push all negative pairs away might reduce the relevance of similar disease:

②Add description or shuffle sentences

③Using medical image video

2.4.4. Explicit knowledge enhancement

①Combined with graph or kownledge graph(KG):

2.4.5. Others

2.4.6. Summary

2.5. CLIP-driven applications

2.5.1. Classification

①CLIP based models on image classification:

（1）Zero-shot classification

①Diagnosis example（我靠还能这样，，做二分类）:

②How Xplainer works（我靠牛呗啊CLIP现在都酱紫玩的）:

（2）Context optimization

①Example of context optimization:

这没什么解释，不能让人快速上手啊哈哈

2.5.2. Dense prediction

①Methods:

（1）Detection

①Lists relevant models

（2）2D medical image segmentation

①fine tune CLIP to 2D medical image dataset

（3）3D medical image segmentation

①Examples:

（4）Others

①Repesentitive models:

（1）Generation

①Automatically generate medical report or medical image

（2）Medical visual question answering

①Example（这构造奇奇怪怪的）:

（3）Image–text retrieval

①Current models focus on global image feature

②X-TRA:

2.5.4. Summary

2.6. Comparative analysis

①How Multi-modality Large Language Model (MLLM) different from CLIP:

②Performance of CLIP on different image sets:

2.7. Discussions and future directions

①Inter-disease similarity:

②Challenges: inconsistency between pre-training and application, incomprehensive evaluation of refined pre-training, challenges of volumetric imaging, limited scope of refined CLIP pre-training, debiasing in CLIP Models, enhancing adversarial robustness of CLIP, exploring the potential of metadata, incorporation of high-order correlations, beyond image–text alignment