📊 Medical Imaging AI Weekly Papers

Segmentation · Generation · Detection · AI Agent · Registration · Dose Calculation
🕐 2026-04-16 14:00:00 (UTC+8)
📅 2026-04-16 📄 30 Papers 🤖 arXiv API
📚 Archive

🔬 Medical Image Segmentation

7条
Deep learning has greatly advanced medical image segmentation, but its success relies heavily on fully supervised learning, which requires dense annotations that are costly and time-consuming for 3D volumetric scans. Barely-supervised learning reduces annotation burden by using only a few labeled slices per volume. Existing methods typically propagate sparse annotations to unlabeled slices through
👤 Shuang Zeng, Boxu Xie, Lei Zhu, Xinliang Zhang et al. (9 authors) 📅 2026-04-13 🔗 arXiv 📄 PDF
Perturbation-based explainability methods such as KernelSHAP provide model-agnostic attributions but are typically impractical for patch-based 3D medical image segmentation due to the large number of coalition evaluations and the high cost of sliding-window inference. We present an efficient KernelSHAP framework for volumetric CT segmentation that restricts computation to a user-defined region of
👤 Ricardo Coimbra Brioso, Giulio Sichili, Damiano Dei, Nicola Lambri et al. (7 authors) 📅 2026-04-13 🔗 arXiv 📄 PDF
Semantic segmentation of histopathology images under class imbalance is typically addressed through frequency-based loss reweighting, which implicitly assumes that rare classes are difficult. However, true difficulty also arises from morphological variability, boundary ambiguity, and contextual similarity-factors that frequency cannot capture. We propose Dynamic Focal Attention (DFA), a simple and
👤 Lakmali Nadeesha Kumari, Sen-Ching Samson Cheung 📅 2026-04-15 🔗 arXiv 📄 PDF
Segmentation models based on deep neural networks demonstrate strong generalization for medical image segmentation. However, they often exhibit overconfidence or underconfidence, leading to unreliable confidence scores for segmentation masks, especially in ambiguous regions. This undermines the trustworthiness required for clinical deployment. Motivated by the learning-to-defer (L2D) paradigm, we
👤 Qiuyu Tian, Haoliang Sun, Yunshan Wang, Yinghuan Shi et al. (5 authors) 📅 2026-04-14 🔗 arXiv 📄 PDF
In clinical practice, the robustness of deep learning models for multimodal brain tumor segmentation is severely compromised by incomplete MRI data. This vulnerability stems primarily from modality bias, where models exploit spurious correlations as shortcuts rather than learning true anatomical structures. Existing feature fusion methods fail to fundamentally eliminate this dependency. To address
👤 Bo Liu, Yulong Zou, Jin Hong 📅 2026-04-15 🔗 arXiv 📄 PDF
Medical image segmentation supports clinical workflows by precisely delineating anatomical structures and lesions. However, medical image datasets medical image datasets suffer from acquisition noise and annotation ambiguity, causing pervasive data uncertainty that substantially undermines model robustness. Existing research focuses primarily on model architectural improvements and predictive reli
👤 Ruiyang Li, Fang Liu, Licheng Jiao, Xinglin Xie et al. (11 authors) 📅 2026-04-13 🔗 arXiv 📄 PDF
Medical image segmentation models built on Segment Anything Model (SAM) achieve strong performance on clean benchmarks, yet their reliability often degrades under realistic image corruptions such as noise, blur, motion artifacts, and modality-specific distortions. Existing approaches address either medical-domain adaptation or corruption robustness, but not both jointly. In SAM, we find that these
👤 Jieru Li, Matthew Chen, Micky C. Nnamdi, J. Ben Tamo et al. (6 authors) 📅 2026-04-10 🔗 arXiv 📄 PDF

🎨 Medical Image Generation & Synthesis

5条
AI-based image reconstruction models are increasingly deployed in clinical workflows to improve image quality from noisy data, such as low-dose X-rays or accelerated MRI scans. However, these models are typically evaluated using pixel-level metrics like PSNR, leaving their impact on downstream diagnostic performance and fairness unclear. We introduce a scalable evaluation framework that applies re
👤 Matteo Wohlrapp, Niklas Bubeck, Daniel Rueckert, William Lotter 📅 2026-04-13 🔗 arXiv 📄 PDF
Latent diffusion models for medical image super-resolution universally inherit variational autoencoders designed for natural photographs. We show that this default choice, not the diffusion architecture, is the dominant constraint on reconstruction quality. In a controlled experiment holding all other pipeline components fixed, replacing the generic Stable Diffusion VAE with MedVAE, a domain-speci
👤 Sebastian Cajas, Ashaba Judith, Rahul Gorijavolu, Sahil Kapadia et al. (11 authors) 📅 2026-04-14 🔗 arXiv 📄 PDF
Removing patient-specific information from medical images is crucial to enable sharing and open science without compromising patient identities. However, many methods currently used for deidentification have negative effects on downstream image analysis tasks because of removal of relevant but non-identifiable information. This work presents an end-to-end deep learning framework for transforming r
👤 Adrienne Kline, Abhijit Gaonkar, Daniel Pittman, Chris Kuehn et al. (5 authors) 📅 2026-04-13 🔗 arXiv 📄 PDF
Immunohistochemistry (IHC) is essential for assessing specific immune biomarkers like Human Epidermal growth-factor Receptor 2 (HER2) in breast cancer. However, the traditional protocols of obtaining IHC stains are resource-intensive, time-consuming, and prone to structural damages. Virtual staining has emerged as a scalable alternative, but it faces significant challenges in preserving fine-grain
👤 Aasim Bin Saleem, Amr Ahmed, Ardhendu Behera, Hafeezullah Amin et al. (8 authors) 📅 2026-04-09 🔗 arXiv 📄 PDF
Interpreting chest X-rays is inherently challenging due to the overlap between anatomical structures and the subtle presentation of many clinically significant pathologies, making accurate diagnosis time-consuming even for experienced radiologists. Recent radiology-focused foundation models, such as LLaVA-Rad and Maira-2, have positioned multi-modal large language models (MLLMs) at the forefront o
👤 Shantam Srivastava, Mahesh Bhosale, David Doermann, Mingchen Gao 📅 2026-04-12 🔗 arXiv 📄 PDF

🩺 Medical Image Detection & Diagnosis

6条
Automated diagnosis based on color fundus photography is essential for large-scale glaucoma screening. However, existing deep learning models are typically data-driven and lack explicit integration of retinal anatomical knowledge, which limits their robustness across heterogeneous clinical datasets. Moreover, pathological cues in fundus images may appear beyond predefined anatomical regions, makin
👤 Yuzhuo Zhou, Chi Liu, Sheng Shen, Zongyuan Ge et al. (12 authors) 📅 2026-04-14 🔗 arXiv 📄 PDF
Pneumonia remains a leading cause of childhood mortality worldwide, with a heavy burden in low-resource settings such as Bangladesh where radiologist availability is limited. Most existing deep learning approaches treat pneumonia detection as a binary problem, overlooking the clinically critical distinction between bacterial and viral aetiology. This paper proposes CBAM-DenseNet121, a transfer-lea
👤 Utsho Kumar Dey 📅 2026-04-14 🔗 arXiv 📄 PDF
In diagnostic reports, experts encode complex imaging data into clinically actionable information. They describe subtle pathological findings that are meaningful in their anatomical context. Reports follow relatively consistent structures, expressing diagnostic information with few words that are often associated with tiny but consequential image observations. Standard vision language models strug
👤 Felicia Bader, Philipp Seeböck, Anastasia Bartashova, Ulrike Attenberger et al. (5 authors) 📅 2026-04-15 🔗 arXiv 📄 PDF
In this study, we proposed a deep Swin-Vision Transformer-based transfer learning architecture for robust multi-cancer histopathological image classification. The proposed framework integrates a hierarchical Swin Transformer with ResNet50-based convolution features extraction, enabling the model to capture both long-range contextual dependencies and fine-grained local morphological patterns within
👤 Muazzem Hussain Khan, Tasdid Hasnain, Md. Jamil khan, Ruhul Amin et al. (7 authors) 📅 2026-04-10 🔗 arXiv 📄 PDF
Despite recent advancements in the field of medical image analysis with the use of pretrained foundation models, the issue of distribution shifts between cross-source images largely remains adamant. To circumvent that issue, investigators generally train a separate model for each source. However, this method becomes expensive when we fully fine-tune pretrained large models for a single dataset, as
👤 Sanjaya Poudel, Nikita Kunwor, Raj Simkhada, Mustafa Munir et al. (6 authors) 📅 2026-04-12 🔗 arXiv 📄 PDF
Retinal Cysts are formed by leakage and accumulation of fluid in the retina due to the incompetence of retinal vasculature. These cystic spaces have significance in several ocular diseases such as age-related macular degeneration, diabetic macular edema, etc. Optical coherence tomography is one of the predominant diagnosing techniques for imaging retinal pathologies. Segmenting and quantification
👤 Abhishek Dharmaratnakar, Aadheeshwar Vijayakumar, Suchand Dayanand 📅 2026-04-12 🔗 arXiv 📄 PDF

🤖 Medical AI Agent & VLM

6条
The potential of Multimodal Large Language Models (MLLMs) in domain of medical imaging raise the demands of systematic and rigorous evaluation frameworks that are aligned with the real-world medical imaging practice. Existing practices that report single or coarse-grained metrics are lack the granularity required for specialized clinical support and fail to assess the reliability of reasoning mech
👤 Zhijie Bao, Fangke Chen, Licheng Bao, Chenhui Zhang et al. (7 authors) 📅 2026-04-15 🔗 arXiv 📄 PDF
While medical Vision-Language models (VLMs) achieve strong performance on tasks such as tumor or organ segmentation and diagnosis prediction, their opaque latent representations limit clinical trust and the ability to explain predictions. Interpretability of these multimodal representations are therefore essential for the trustworthy clinical deployment of pretrained medical VLMs. However, current
👤 Md Rakibul Haque, KM Arefeen Sultan, Tushar Kataria, Shireen Elhabian 📅 2026-04-13 🔗 arXiv 📄 PDF
Medical Vision-Language Models (VLMs) hold immense promise for complex clinical tasks, but their reasoning capabilities are often constrained by text-only paradigms that fail to ground inferences in visual evidence. This limitation not only curtails performance on tasks requiring fine-grained visual analysis but also introduces risks of visual hallucination in safety-critical applications. Thus, w
👤 Zheng Jiang, Heng Guo, Chengyu Fang, Changchen Xiao et al. (7 authors) 📅 2026-04-09 🔗 arXiv 📄 PDF
3D medical image analysis is of great importance in disease diagnosis and treatment. Recently, multimodal large language models (MLLMs) have exhibited robust perceptual capacity, strong cross-modal alignment, and promising generalizability. Therefore, they have great potential to improve the performance of medical report generation (MRG) and medical visual question answering (MVQA), which serve as
👤 Yang Yu, Dunyuan Xu, Yaoqian Li, Xiaomeng Li et al. (6 authors) 📅 2026-04-11 🔗 arXiv 📄 PDF
We introduce GazeVaLM, a public eye-tracking dataset for studying clinical perception during chest radiograph authenticity assessment. The dataset comprises 960 gaze recordings from 16 expert radiologists interpreting 30 real and 30 synthetic chest X-rays (generated by diffusion based generative AI) under two conditions: diagnostic assessment and real-fake classification (Visual Turing test). For
👤 David Wong, Zeynep Isik, Bin Wang, Marouane Tliba et al. (25 authors) 📅 2026-04-13 🔗 arXiv 📄 PDF
Medical vision--language models (VLMs) have shown strong potential for medical visual question answering (VQA), yet their reasoning remains largely text-centric: images are encoded once as static context, and subsequent inference is dominated by language. This paradigm is fundamentally limited in clinical scenarios, where accurate answers often depend on subtle, localized visual evidence that cann
👤 Suyang Xi, Songtao Hu, Yuxiang Lai, Wangyun Dan et al. (7 authors) 📅 2026-04-10 🔗 arXiv 📄 PDF

🔄 Medical Image Registration

4条
Proton therapy offers superior organ-at-risk sparing but is highly sensitive to anatomical changes, making accurate deformable image registration (DIR) across longitudinal CT scans essential. Conventional DIR methods are often too slow for emerging online adaptive workflows, while existing deep learning-based approaches are primarily designed for generic benchmarks and underutilize clinically rele
👤 Caiwen Jiang, Yuzhen Ding, Mi Jia, Samir H. Patel et al. (17 authors) 📅 2026-04-15 🔗 arXiv 📄 PDF
Multi-modal image registration plays a critical role in precision medicine but faces challenges from non-linear intensity relationships and local optima. While deep learning models enable rapid inference, they often suffer from generalization collapse on unseen modalities. To address this, we propose Search-MIND, a training-free, iterative optimization framework for instance-specific registration.
👤 Boya Wang, Ruizhe Li, Chao Chen, Xin Chen 📅 2026-04-10 🔗 arXiv 📄 PDF
Objective: The study aims to address the challenge of aligning Standard Fundus Images (SFIs) and Ultra-Widefield Fundus Images (UWFIs), which is difficult due to their substantial differences in viewing range and the amorphous appearance of the retina. Currently, no specialized method exists for this task, and existing image alignment techniques lack accuracy. Methods: We propose Active Diffusio
👤 Kanggeon Lee, Su Jeong Song, Soochahn Lee, Kyoung Mu Lee 📅 2026-04-11 🔗 arXiv 📄 PDF
Registration between preoperative CT and intraoperative laparoscopic video plays a crucial role in augmented reality (AR) guidance for minimally invasive surgery. Learning-based methods have recently achieved registration errors comparable to optimization-based approaches while offering faster inference. However, many supervised methods produce coarse alignments that rely on additional optimizatio
👤 Hanyuan Zhang, Lucas He, Zijie Cheng, Abdolrahim Kadkhodamohammadi et al. (8 authors) 📅 2026-04-11 🔗 arXiv 📄 PDF

☢️ Radiation Dose Calculation

2条
We introduce a novel learning framework for accelerated Monte Carlo (MC) dose calculation termed Energy-Shifting. This approach leverages deep learning to synthesize 6 MV TrueBeam Linear Accelerator (LINAC) dose distributions directly from monoenergetic inputs under identical beam configurations. Unlike conventional denoising techniques, which rely on noisy low-count dose maps that compromise beam
👤 Chi-Hieu Pham, Didier Benoit, Vincent Bourbonne, Ulrike Schick et al. (5 authors) 📅 2026-04-10 🔗 arXiv 📄 PDF
Purpose: Accurate dose calculation is essential in radiotherapy for precise tumor irradiation while sparing healthy tissue. With the growing adoption of MRI-guided and real-time adaptive radiotherapy, fast and accurate dose calculation on CT and MRI is increasingly needed. The DoseRAD2026 dataset and challenge provide a public benchmark of paired CT and MRI data with beam-level photon and proton M
👤 Fan Xiao, Nikolaos Delopoulos, Niklas Wahl, Lennart Volz et al. (16 authors) 📅 2026-04-14 🔗 arXiv 📄 PDF