Logo VDC

Versatile Data Cleanser based on Visual-Linguistic Inconsistency
by Multimodal Large Language Models

ICLR 2024
1The Chinese University of Hong Kong, Shenzhen,
2Tencent AI Lab
geometric reasoning

Figure 1: Examples of poisoned samples for backdoor attack. The attacker maliciously manipulate partical clean samples by embedding triggers and changing the ground-truth labels to target labels, thereby generating poisoned samples.

🔔News

Introduction

The role of data in building AI systems has recently been emphasized by the emerging concept of data-centric AI. Unfortunately, in the real-world, datasets may contain dirty samples, such as poisoned samples from backdoor attack, noisy labels in crowdsourcing, and even hybrids of them. The presence of such dirty samples makes the DNNs vunerable and unreliable. Hence, it is critical to detect dirty samples to improve the quality and realiability of dataset.

Existing detectors only focus on detecting poisoned samples or noisy labels, that are often prone to weak generalization when dealing with dirty samples from other domains.

In this paper, we find a commonality of various dirty samples is visual-linguistic inconsistency between images and associated labels. To capture the semantic inconsistency between modalities, we propose versatile data cleanser (VDC) leveraging the surpassing capabilities of multimodal large language models (MLLM) in cross-modal alignment and reasoning. It consists of three consecutive modules: the visual question generation module to generate insightful questions about the image; the visual question answering module to acquire the semantics of the visual content by answering the questions with MLLM; followed by the visual answer evaluation module to evaluate the inconsistency. Extensive experiments demonstrate its superior performance and generalization to various categories and types of dirty samples.

Logo VDC Framework

Overview

We design the versatile data cleanser (VDC), a universal detection framework harnessing the surpassing capabilities of multimodal large language models, which is capable of detecting various categories and types of dirty samples. It consists of three consecutive modules: the visual question generation module, the visual question answering module, and the visual answer evaluation module.


geometric reasoning

Figure 2: The framework of Versatile Data Cleanser.


  • Visual Question Generation (VQG): VQG module first generates insightful visual questions related to the given labels based on the templates and LLM.
  • Visual Question Answering (VQA): Then VQA module resorts to MLLM to answer the gen- erated visual questions about the image to acquire the semantics of the visual content.
  • Visual Answer Evaluation (VAE): The VAE module assesses visual-linguistic inconsistency by evaluating the matching score between the semantics of the image and label.

Experiment Results

Results on Detecting Poisoned Samples

We consider six representative backdoor attacks to generate poisoned samples: (1) Visible triggers: BadNets, Blended, TrojanNN. (2) Invisible triggers: SIG, SSBA, WaNet. For all attacks, we randomly choose the same number of images from all classes except target class to add trigger, and then change the labels as target label.

Results on Detecting Noisy Labels

We experiment with two popular synthetic noisy model models: the symmetric and asymmetric noise: (1) Symmetric noisy label is generated by uniform flipping, i.e., randomly flipping a ground-truth label to all other possible classes. (2) Asymmetric noisy label is generated by flipping the ground-truth label to the next class, i.e., (i mod K) + 1, where K denotes the number of classes.

Examples of Generated Questions

BibTeX


      @inproceedings{zhu2023vdc,
      title={VDC: Versatile Data Cleanser based on Visual-Linguistic Inconsistency by Multimodal Large Language Models},
      author={Zhu, Zihao and Zhang, Mingda and Wei, Shaokui and Wu, Bingzhe and Wu, Baoyuan},
      booktitle={The Twelfth International Conference on Learning Representations},
      year={2024}
      }