样式: 排序: IF: - GO 导出 标记为已读
-
Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-05-07 Haiwen Diao, Ying Zhang, Shang Gao, Xiang Ruan, Huchuan Lu
-
Tensorized Multi-View Low-Rank Approximation Based Robust Hand-Print Recognition IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-05-06 Shuping Zhao, Lunke Fei, Bob Zhang, Jie Wen, Pengyang Zhao
Since hand-print recognition, i.e., palmprint, finger-knuckle-print (FKP), and hand-vein, have significant superiority in user convenience and hygiene, it has attracted greater enthusiasm from researchers. Seeking to handle the long-standing interference factors, i.e., noise, rotation, shadow, in hand-print images, multi-view hand-print representation has been proposed to enhance the feature expression
-
LEAPSE: Learning Environment Affordances for 3D Human Pose and Shape Estimation IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-05-06 Fangzheng Tian, Sungchan Kim
We live in a 3D world where people interact with each other in the environment. Learning 3D posed humans therefore requires us to perceive and interpret these interactions. This paper proposes LEAPSE, a novel method that learns salient instance affordances for estimating a posed body from a single RGB image in a non-parametric manner. Existing methods mostly ignore the environment and estimate the
-
LSSVC: A Learned Spatially Scalable Video Coding Scheme IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-05-06 Yifan Bian, Xihua Sheng, Li Li, Dong Liu
Traditional block-based spatially scalable video coding has been studied for over twenty years. While significant advancements have been made, the scope for further improvement in compression performance is limited. Inspired by the success of learned video coding, we propose an end-to-end learned spatially scalable video coding scheme, LSSVC, which provides a new solution for scalable video coding
-
Multi-View Time-Series Hypergraph Neural Network for Action Recognition IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-05-03 Nan Ma, Zhixuan Wu, Yifan Feng, Cheng Wang, Yue Gao
Recently, action recognition has attracted considerable attention in the field of computer vision. In dynamic circumstances and complicated backgrounds, there are some problems, such as object occlusion, insufficient light, and weak correlation of human body joints, resulting in skeleton-based human action recognition accuracy being very low. To address this issue, we propose a Multi-View Time-Series
-
Multi-Stage Image-Language Cross-Generative Fusion Network for Video-Based Referring Expression Comprehension IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-05-02 Yujia Zhang, Qianzhong Li, Yi Pan, Xiaoguang Zhao, Min Tan
Video-based referring expression comprehension is a challenging task that requires locating the referred object in each video frame of a given video. While many existing approaches treat this task as an object-tracking problem, their performance is heavily reliant on the quality of the tracking templates. Furthermore, when there is not enough annotation data to assist in template selection, the tracking
-
Relationship Learning From Multisource Images via Spatial-Spectral Perception Network IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-05-02 Yunhao Gao, Wei Li, Junjie Wang, Mengmeng Zhang, Ran Tao
Advances in multisource remote sensing have allowed for the development of more comprehensive observation. The adoption of deep convolutional neural networks (CNN) naturally includes spatial-spectral information, which has achieved promising performance in multisource data classification. However, challenges are still found with the extraction of spatial distribution and spectrum relationships, which
-
Deep Feature Statistics Mapping for Generalized Screen Content Image Quality Assessment IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-05-01 Baoliang Chen, Hanwei Zhu, Lingyu Zhu, Shiqi Wang, Sam Kwong
The statistical regularities of natural images, referred to as natural scene statistics, play an important role in no-reference image quality assessment. However, it has been widely acknowledged that screen content images (SCIs), which are typically computer generated, do not hold such statistics. Here we make the first attempt to learn the statistics of SCIs, based upon which the quality of SCIs can
-
Occlusion-Aware Transformer With Second-Order Attention for Person Re-Identification IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-30 Yanping Li, Yizhang Liu, Hongyun Zhang, Cairong Zhao, Zhihua Wei, Duoqian Miao
Person re-identification (ReID) typically encounters varying degrees of occlusion in real-world scenarios. While previous methods have addressed this using handcrafted partitions or external cues, they often compromise semantic information or increase network complexity. In this paper, we propose a new method from a novel perspective, termed as OAT. Specifically, we first use a Transformer backbone
-
QueryTrack: Joint-Modality Query Fusion Network for RGBT Tracking IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-30 Huijie Fan, Zhencheng Yu, Qiang Wang, Baojie Fan, Yandong Tang
Existing RGB-Thermal trackers usually treat intra-modal feature extraction and inter-modal feature fusion as two separate processes, therefore the mutual promotion of extraction and fusion is neglected. Then, the complementary advantages of RGB-T fusion are not fully exploited, and the independent feature extraction is not adaptive to modal quality fluctuation during tracking. To address the limitations
-
Learning to Recover Spectral Reflectance From RGB Images IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-30 Dong Huo, Jian Wang, Yiming Qian, Yee-Hong Yang
This paper tackles spectral reflectance recovery (SRR) from RGB images. Since capturing ground-truth spectral reflectance and camera spectral sensitivity are challenging and costly, most existing approaches are trained on synthetic images and utilize the same parameters for all unseen testing images, which are suboptimal especially when the trained models are tested on real images because they never
-
Quality-Aware Selective Fusion Network for V-D-T Salient Object Detection IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-30 Liuxin Bao, Xiaofei Zhou, Xiankai Lu, Yaoqi Sun, Haibing Yin, Zhenghui Hu, Jiyong Zhang, Chenggang Yan
Depth images and thermal images contain the spatial geometry information and surface temperature information, which can act as complementary information for the RGB modality. However, the quality of the depth and thermal images is often unreliable in some challenging scenarios, which will result in the performance degradation of the two-modal based salient object detection (SOD). Meanwhile, some researchers
-
Anisotropic Scale-Invariant Ellipse Detection IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-29 Zikai Wang, Baojiang Zhong, Kai-Kuang Ma
Detecting ellipses poses a challenging low-level task indispensable to many image analysis applications. Existing ellipse detection methods commonly encounter two fundamental issues. First, inferior detection accuracy could be incurred on a small ellipse than that on a large one; this introduces the scale issue. Second, inferior detection accuracy could be yielded along the minor axis than along the
-
Multi-Label Action Anticipation for Real-World Videos With Scene Understanding IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-25 Yuqi Zhang, Xiucheng Li, Hao Xie, Weijun Zhuang, Shihui Guo, Zhijun Li
With human action anticipation becoming an essential tool for many practical applications, there has been an increasing trend in developing more accurate anticipation models in recent years. Most of the existing methods target standard action anticipation datasets, in which they could produce promising results by learning action-level contextual patterns. However, the over-simplified scenarios of standard
-
Fine-Grained Recognition With Learnable Semantic Data Augmentation IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-25 Yifan Pu, Yizeng Han, Yulin Wang, Junlan Feng, Chao Deng, Gao Huang
Fine-grained image recognition is a longstanding computer vision challenge that focuses on differentiating objects belonging to multiple subordinate categories within the same meta-category. Since images belonging to the same meta-category usually share similar visual appearances, mining discriminative visual cues is the key to distinguishing fine-grained categories. Although commonly used image-level
-
Mitigating Search Interference With Task-Aware Nested Search IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-24 Jiho Lee, Eunwoo Kim
Neural Architecture Search (NAS) has emerged as a promising tool in the field of AutoML for designing more accurate and efficient architectures. The majority of NAS works employ a weight-sharing technique to reduce the search cost by sharing the weights of a supernet, which is a composite of all architectures produced from the search space. Nonetheless, this method has a significant drawback in that
-
CS2DIPs: Unsupervised HSI Super-Resolution Using Coupled Spatial and Spectral DIPs IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-24 Yuan Fang, Yipeng Liu, Chong-Yung Chi, Zhen Long, Ce Zhu
In recent years, fusing high spatial resolution multispectral images (HR-MSIs) and low spatial resolution hyperspectral images (LR-HSIs) has become a widely used approach for hyperspectral image super-resolution (HSI-SR). Various unsupervised HSI-SR methods based on deep image prior (DIP) have gained wide popularity thanks to no pre-training requirement. However, DIP-based methods often demonstrate
-
Multi-Stage Network With Geometric Semantic Attention for Two-View Correspondence Learning IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-24 Shuyuan Lin, Xiao Chen, Guobao Xiao, Hanzi Wang, Feiran Huang, Jian Weng
The removal of outliers is crucial for establishing correspondence between two images. However, when the proportion of outliers reaches nearly 90%, the task becomes highly challenging. Existing methods face limitations in effectively utilizing geometric transformation consistency (GTC) information and incorporating geometric semantic neighboring information. To address these challenges, we propose
-
Model-Based Explainable Deep Learning for Light-Field Microscopy Imaging IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-24 Pingfan Song, Herman Verinaz Jadan, Carmel L. Howe, Amanda J. Foust, Pier Luigi Dragotti
In modern neuroscience, observing the dynamics of large populations of neurons is a critical step of understanding how networks of neurons process information. Light-field microscopy (LFM) has emerged as a type of scanless, high-speed, three-dimensional (3D) imaging tool, particularly attractive for this purpose. Imaging neuronal activity using LFM calls for the development of novel computational approaches
-
Graph-Represented Distribution Similarity Index for Full-Reference Image Quality Assessment IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-24 Wenhao Shen, Mingliang Zhou, Jun Luo, Zhengguo Li, Sam Kwong
In this paper, we propose a graph-represented image distribution similarity (GRIDS) index for full-reference (FR) image quality assessment (IQA), which can measure the perceptual distance between distorted and reference images by assessing the disparities between their distribution patterns under a graph-based representation. First, we transform the input image into a graph-based representation, which
-
Learning Contrast-Enhanced Shape-Biased Representations for Infrared Small Target Detection IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-24 Fanzhao Lin, Kexin Bao, Yong Li, Dan Zeng, Shiming Ge
Detecting infrared small targets under cluttered background is mainly challenged by dim textures, low contrast and varying shapes. This paper proposes an approach to facilitate infrared small target detection by learning contrast-enhanced shape-biased representations. The approach cascades a contrast-shape encoder and a shape-reconstructable decoder to learn discriminative representations that can
-
Fine-Grained Essential Tensor Learning for Robust Multi-View Spectral Clustering IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-24 Chong Peng, Kehan Kang, Yongyong Chen, Zhao Kang, Chenglizhao Chen, Qiang Cheng
Multi-view subspace clustering (MVSC) has drawn significant attention in recent study. In this paper, we propose a novel approach to MVSC. First, the new method is capable of preserving high-order neighbor information of the data, which provides essential and complicated underlying relationships of the data that is not straightforwardly preserved by the first-order neighbors. Second, we design log-based
-
Multi-Granularity Contrastive Cross-Modal Collaborative Generation for End-to-End Long-Term Video Question Answering IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-24 Ting Yu, Kunhao Fu, Jian Zhang, Qingming Huang, Jun Yu
Long-term Video Question Answering (VideoQA) is a challenging vision-and-language bridging task focusing on semantic understanding of untrimmed long-term videos and diverse free-form questions, simultaneously emphasizing comprehensive cross-modal reasoning to yield precise answers. The canonical approaches often rely on off-the-shelf feature extractors to detour the expensive computation overhead,
-
Exploring Video Denoising in Thermal Infrared Imaging: Physics-inspired Noise Generator, Dataset and Model IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-23 Lijing Cai, Xiangyu Dong, Kailai Zhou, Xun Cao
-
Accurate 3D Measurement of Complex Texture Objects by Height Compensation Using a Dual-Projector Structure IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-22 Pengcheng Yao, Yuchong Chen, Shaoyan Gai, Feipeng Da
Fringe projection profilometry is a widely used technique for 3D measurement due to its high accuracy and speed. However, the accuracy significantly decreases when measuring complex texture objects, especially in the junction of different colors. This paper analyzes the causes of errors resulting from complex textures and proposes a height compensation method to revise the error by employing a dual-projector
-
Classification of Small Drones Using Low-Uncertainty Micro-Doppler Signature Images and Ultra-Lightweight Convolutional Neural Network IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-19 Junhyeong Park, Jun-Sung Park
Many studies have attempted to classify small drones in response to threats posed by the technical progress of small drones. Recently, small drones have been classified utilizing convolutional neural networks (CNNs) with micro-Doppler signature (MDS) images generated from frequency-modulated continuous-wave (FMCW) radars. This study proposes a comprehensive method for classifying small drones in real-time
-
Image Reconstruction for Accelerated MR Scan With Faster Fourier Convolutional Neural Networks IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-19 Xiaohan Liu, Yanwei Pang, Xuebin Sun, Yiming Liu, Yonghong Hou, Zhenchang Wang, Xuelong Li
High quality image reconstruction from undersampled ${k}$ -space data is key to accelerating MR scanning. Current deep learning methods are limited by the small receptive fields in reconstruction networks, which restrict the exploitation of long-range information, and impede the mitigation of full-image artifacts, particularly in 3D reconstruction tasks. Additionally, the substantial computational
-
Fast Continual Multi-View Clustering With Incomplete Views IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-19 Xinhang Wan, Bin Xiao, Xinwang Liu, Jiyuan Liu, Weixuan Liang, En Zhu
Multi-view clustering (MVC) has attracted broad attention due to its capacity to exploit consistent and complementary information across views. This paper focuses on a challenging issue in MVC called the incomplete continual data problem (ICDP). Specifically, most existing algorithms assume that views are available in advance and overlook the scenarios where data observations of views are accumulated
-
Multi-Relational Deep Hashing for Cross-Modal Search IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-16 Xiao Liang, Erkun Yang, Yanhua Yang, Cheng Deng
Deep cross-modal hashing retrieval has recently made significant progress. However, existing methods generally learn hash functions with pairwise or triplet supervisions, which involves learning the relevant information by splicing partial similarity between data pairs; notably, this approach only captures the data similarity locally and incompletely, resulting in sub-optimal retrieval performance
-
GLPanoDepth: Global-to-Local Panoramic Depth Estimation IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-15 Jiayang Bai, Haoyu Qin, Shuichang Lai, Jie Guo, Yanwen Guo
Depth estimation is a fundamental task in many vision applications. With the popularity of omnidirectional cameras, it becomes a new trend to tackle this problem in the spherical space. In this paper, we propose a learning-based method for predicting dense depth values of a scene from a monocular omnidirectional image. An omnidirectional image has a full field-of-view, providing much more complete
-
ISTR: Mask-Embedding-Based Instance Segmentation Transformer IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-12 Jie Hu, Yao Lu, Shengchuan Zhang, Liujuan Cao
Transformer-based instance-level recognition has attracted increasing research attention recently due to the superior performance. However, although attempts have been made to encode masks as embeddings into Transformer-based frameworks, how to combine mask embeddings and spatial information for a transformer-based approach is still not fully explored. In this paper, we revisit the design of mask-embedding-based
-
Deep Variation Prior: Joint Image Denoising and Noise Variance Estimation Without Clean Data IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-12 Rihuan Ke
With recent deep learning based approaches showing promising results in removing noise from images, the best denoising performance has been reported in a supervised learning setup that requires a large set of paired noisy images and ground truth data for training. The strong data requirement can be mitigated by unsupervised learning techniques, however, accurate modelling of images or noise variances
-
Saliency Guided Deep Neural Network for Color Transfer With Light Optimization IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-12 Yuming Fang, Pengwei Yuan, Chenlei Lv, Chen Peng, Jiebin Yan, Weisi Lin
Color transfer aims to change the color information of the target image according to the reference one. Many studies propose color transfer methods by analysis of color distribution and semantic relevance, which do not take the perceptual characteristics for visual quality into consideration. In this study, we propose a novel color transfer method based on the saliency information with brightness optimization
-
Single Stage Adaptive Multi-Attention Network for Image Restoration IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-10 Anas Zafar, Danyal Aftab, Rizwan Qureshi, Xinqi Fan, Pingjun Chen, Jia Wu, Hazrat Ali, Shah Nawaz, Sheheryar Khan, Mubarak Shah
Recently attention-based networks have been successful for image restoration tasks. However, existing methods are either computationally expensive or have limited receptive fields, adding constraints to the model. They are also less resilient in spatial and contextual aspects and lack pixel-to-pixel correspondence, which may degrade feature representations. In this paper, we propose a novel and computationally
-
High-Quality and Diverse Few-Shot Image Generation via Masked Discrimination IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-10 Jingyuan Zhu, Huimin Ma, Jiansheng Chen, Jian Yuan
Few-shot image generation aims to generate images of high quality and great diversity with limited data. However, it is difficult for modern GANs to avoid overfitting when trained on only a few images. The discriminator can easily remember all the training samples and guide the generator to replicate them, leading to severe diversity degradation. Several methods have been proposed to relieve overfitting
-
RefQSR: Reference-Based Quantization for Image Super-Resolution Networks IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-10 Hongjae Lee, Jun-Sang Yoo, Seung-Won Jung
Single image super-resolution (SISR) aims to reconstruct a high-resolution image from its low-resolution observation. Recent deep learning-based SISR models show high performance at the expense of increased computational costs, limiting their use in resource-constrained environments. As a promising solution for computationally efficient network design, network quantization has been extensively studied
-
Nonconvex Robust High-Order Tensor Completion Using Randomized Low-Rank Approximation IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-10 Wenjin Qin, Hailin Wang, Feng Zhang, Weijun Ma, Jianjun Wang, Tingwen Huang
Within the tensor singular value decomposition (T-SVD) framework, existing robust low-rank tensor completion approaches have made great achievements in various areas of science and engineering. Nevertheless, these methods involve the T-SVD based low-rank approximation, which suffers from high computational costs when dealing with large-scale tensor data. Moreover, most of them are only applicable to
-
Source-Guided Target Feature Reconstruction for Cross-Domain Classification and Detection IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-09 Yifan Jiao, Hantao Yao, Bing-Kun Bao, Changsheng Xu
Existing cross-domain classification and detection methods usually apply a consistency constraint between the target sample and its self-augmentation for unsupervised learning without considering the essential source knowledge. In this paper, we propose a Source-guided Target Feature Reconstruction (STFR) module for cross-domain visual tasks, which applies source visual words to reconstruct the target
-
Relationship-Incremental Scene Graph Generation by a Divide-and-Conquer Pipeline with Feature Adapter IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-08 Xuewei Li, Guangcong Zheng, Yunlong Yu, Naye Ji, Xi Li
-
DriftRec: Adapting Diffusion Models to Blind JPEG Restoration IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-05 Simon Welker, Henry N. Chapman, Timo Gerkmann
In this work, we utilize the high-fidelity generation abilities of diffusion models to solve blind JPEG restoration at high compression levels. We propose an elegant modification of the forward stochastic differential equation of diffusion models to adapt them to this restoration task and name our method DriftRec. Comparing DriftRec against an $L_{2}$ regression baseline with the same network architecture
-
Generalizing to Out-of-Sample Degradations via Model Reprogramming IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-05 Runhua Jiang, Yahong Han
Existing image restoration models are typically designed for specific tasks and struggle to generalize to out-of-sample degradations not encountered during training. While zero-shot methods can address this limitation by fine-tuning model parameters on testing samples, their effectiveness relies on predefined natural priors and physical models of specific degradations. Nevertheless, determining out-of-sample
-
Shared Manifold Regularized Joint Feature Selection for Joint Classification and Regression in Alzheimer’s Disease Diagnosis IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-04 Zhi Chen, Yongguo Liu, Yun Zhang, Jiajing Zhu, Qiaoqin Li, Xindong Wu
In Alzheimer’s disease (AD) diagnosis, joint feature selection for predicting disease labels (classification) and estimating cognitive scores (regression) with neuroimaging data has received increasing attention. In this paper, we propose a model named Shared Manifold regularized Joint Feature Selection (SMJFS) that performs classification and regression in a unified framework for AD diagnosis. For
-
Orthogonal Spatial Binary Coding Method for High-Speed 3D Measurement IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-01 Haitao Wu, Yiping Cao, Yongbo Dai, Zhimi Wei
Temporal phase unwrapping based on single auxiliary binary coded pattern has been proven to be effective for high-speed 3D measurement. However, in traditional spatial binary coding, it often leads to an imbalance between the number of periodic divisions and codewords. To meet this challenge, a large codewords orthogonal spatial binary coding method is proposed in this paper. By expanding spatial multiplexing
-
Hierarchical Perceptual Noise Injection for Social Media Fingerprint Privacy Protection IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-04-01 Simin Li, Huangxinxin Xu, Jiakai Wang, Ruixiao Xu, Aishan Liu, Fazhi He, Xianglong Liu, Dacheng Tao
Billions of people share images from their daily lives on social media every day. However, their biometric information (e.g., fingerprints) could be easily stolen from these images. The threat of fingerprint leakage from social media has created a strong desire to anonymize shared images while maintaining image quality, since fingerprints act as a lifelong individual biometric password. To guard the
-
Bilateral Context Modeling for Residual Coding in Lossless 3D Medical Image Compression IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-03-25 Xiangrui Liu, Meng Wang, Shiqi Wang, Sam Kwong
Residual coding has gained prevalence in lossless compression, where a lossy layer is initially employed and the reconstruction errors (i.e., residues) are then losslessly compressed. The underlying principle of the residual coding revolves around the exploration of priors based on context modeling. Herein, we propose a residual coding framework for 3D medical images, involving the off-the-shelf video
-
Anomaly Detection for Medical Images Using Heterogeneous Auto-Encoder IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-03-29 Shuai Lu, Weihang Zhang, He Zhao, Hanruo Liu, Ningli Wang, Huiqi Li
Anomaly detection is an important task for medical image analysis, which can alleviate the reliance of supervised methods on large labelled datasets. Most existing methods use a pixel-wise self-reconstruction framework for anomaly detection. However, there are two challenges of these studies: 1) they tend to overfit learning an identity mapping between the input and output, which leads to failure in
-
Region Aware Video Object Segmentation With Deep Motion Modeling IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-03-29 Bo Miao, Mohammed Bennamoun, Yongsheng Gao, Ajmal Mian
Current semi-supervised video object segmentation (VOS) methods often employ the entire features of one frame to predict object masks and update memory. This introduces significant redundant computations. To reduce redundancy, we introduce a Region Aware Video Object Segmentation (RAVOS) approach, which predicts regions of interest (ROIs) for efficient object segmentation and memory storage. RAVOS
-
Knowledge-Augmented Visual Question Answering With Natural Language Explanation IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-03-28 Jiayuan Xie, Yi Cai, Jiali Chen, Ruohang Xu, Jiexin Wang, Qing Li
Visual question answering with natural language explanation (VQA-NLE) is a challenging task that requires models to not only generate accurate answers but also to provide explanations that justify the relevant decision-making processes. This task is accomplished by generating natural language sentences based on the given question-image pair. However, existing methods often struggle to ensure consistency
-
Robust Fine-Grained Visual Recognition With Neighbor-Attention Label Correction IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-03-28 Shunan Mao, Shiliang Zhang
Existing deep learning methods for fine-grained visual recognition often rely on large-scale, well-annotated training data. Obtaining fine-grained annotations in the wild typically requires concentration and expertise, such as fine category annotation for species recognition, instance annotation for person re-identification (re-id) and dense annotation for segmentation, which inevitably leads to label
-
Label-Aware Calibration and Relation-Preserving in Visual Intention Understanding IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-03-27 QingHongYa Shi, Mang Ye, Wenke Huang, Weijian Ruan, Bo Du
Visual intention understanding is a challenging task that explores the hidden intention behind the images of publishers in social media. Visual intention represents implicit semantics, whose ambiguous definition inevitably leads to label shifting and label blemish. The former indicates that the same image delivers intention discrepancies under different data augmentations, while the latter represents
-
Weakly-Supervised Contrastive Learning for Unsupervised Object Discovery IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-03-27 Yunqiu Lv, Jing Zhang, Nick Barnes, Yuchao Dai
Unsupervised object discovery (UOD) refers to the task of discriminating the whole region of objects from the background within a scene without relying on labeled datasets, which benefits the task of bounding-box-level localization and pixel-level segmentation. This task is promising due to its ability to discover objects in a generic manner. We roughly categorize existing techniques into two main
-
Temporal Feature Fusion for 3D Detection in Monocular Video IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-03-27 Haoran Cheng, Liang Peng, Zheng Yang, Binbin Lin, Xiaofei He, Boxi Wu
Previous monocular 3D detection works focus on the single frame input in both training and inference. In real-world applications, temporal and motion information naturally exists in monocular video. It is valuable for 3D detection but under-explored in monocular works. In this paper, we propose a straightforward and effective method for temporal feature fusion, which exhibits low computation cost and
-
Instance-Specific Semantic Augmentation for Long-Tailed Image Classification IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-03-27 Jiahao Chen, Bing Su
Recent long-tailed classification methods generally adopt the two-stage pipeline and focus on learning the classifier to tackle the imbalanced data in the second stage via re-sampling or re-weighting, but the classifier is easily prone to overconfidence in head classes. Data augmentation is a natural way to tackle this issue. Existing augmentation methods either perform low-level transformations or
-
BadCM: Invisible Backdoor Attack Against Cross-Modal Learning IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-03-27 Zheng Zhang, Xu Yuan, Lei Zhu, Jingkuan Song, Liqiang Nie
Despite remarkable successes in unimodal learning tasks, backdoor attacks against cross-modal learning are still underexplored due to the limited generalization and inferior stealthiness when involving multiple modalities. Notably, since works in this area mainly inherit ideas from unimodal visual attacks, they struggle with dealing with diverse cross-modal attack circumstances and manipulating imperceptible
-
Toward Accurate Human Parsing Through Edge Guided Diffusion IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-03-27 Ting Liu, Hongkun Zhu, Yunchao Wei, Shikui Wei, Yao Zhao, Yanning Zhang
Existing human parsing frameworks commonly employ joint learning of semantic edge detection and human parsing to facilitate the localization around boundary regions. Nevertheless, the parsing prediction within the interior of the part contour may still exhibit inconsistencies due to the inherent ambiguity of fine-grained semantics. In contrast, binary edge detection does not suffer from such fine-grained
-
In Defense of Clip-Based Video Relation Detection IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-03-27 Meng Wei, Long Chen, Wei Ji, Xiaoyu Yue, Roger Zimmermann
Video Visual Relation Detection (VidVRD) aims to detect visual relationship triplets in videos using spatial bounding boxes and temporal boundaries. Existing VidVRD methods can be broadly categorized into bottom-up and top-down paradigms, depending on their approach to classifying relations. Bottom-up methods follow a clip-based approach where they classify relations of short clip tubelet pairs and
-
Cross-Layer Contrastive Learning of Latent Semantics for Facial Expression Recognition IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-03-27 Weicheng Xie, Zhibin Peng, Linlin Shen, Wenya Lu, Yang Zhang, Siyang Song
Convolutional neural networks (CNNs) have achieved significant improvement for the task of facial expression recognition. However, current training still suffers from the inconsistent learning intensities among different layers, i.e., the feature representations in the shallow layers are not sufficiently learned compared with those in deep layers. To this end, this work proposes a contrastive learning
-
Single-Image-Based Deep Learning for Segmentation of Early Esophageal Cancer Lesions IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-03-27 Haipeng Li, Dingrui Liu, Yu Zeng, Shuaicheng Liu, Tao Gan, Nini Rao, Jinlin Yang, Bing Zeng
Accurate segmentation of lesions is crucial for diagnosis and treatment of early esophageal cancer (EEC). However, neither traditional nor deep learning-based methods up to today can meet the clinical requirements, with the mean Dice score - the most important metric in medical image analysis - hardly exceeding 0.75. In this paper, we present a novel deep learning approach for segmenting EEC lesions
-
DeGCN: Deformable Graph Convolutional Networks for Skeleton-Based Action Recognition IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-03-25 Woomin Myung, Nan Su, Jing-Hao Xue, Guijin Wang
Graph convolutional networks (GCN) have recently been studied to exploit the graph topology of the human body for skeleton-based action recognition. However, most of these methods unfortunately aggregate messages via an inflexible pattern for various action samples, lacking the awareness of intra-class variety and the suitableness for skeleton sequences, which often contain redundant or even detrimental
-
Cross-Modal Retrieval With Noisy Correspondence via Consistency Refining and Mining IEEE Trans. Image Process. (IF 10.6) Pub Date : 2024-03-25 Xinran Ma, Mouxing Yang, Yunfan Li, Peng Hu, Jiancheng Lv, Xi Peng
The success of existing cross-modal retrieval (CMR) methods heavily rely on the assumption that the annotated cross-modal correspondence is faultless. In practice, however, the correspondence of some pairs would be inevitably contaminated during data collection or annotation, thus leading to the so-called Noisy Correspondence (NC) problem. To alleviate the influence of NC, we propose a novel method