Llava Guard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models

1 minute read

Reviewed: November 08, 2025

I. Introduction

VLM(Vision-Language Model): Text 및 Image 생성. Visual and Textual Inputs.

Training

→ 따라서 LlavaGuard를 통해 이미지 라벨링을 사전에 진행(Safe/Unsafe)

why?

i) 모델 학습 중 안전을 저해할 수 있는 콘텐츠를 제외해 안전한 학습 환경을 구축.

ii) LlavaGuard로 생성된 데이터셋은 향후 연구에 유용한 리소스가 될 것.

LlavaGuard: VLM Safeguard. Visual Content의 안전 규정 준수여부를 평가하는 다목적 도구

Visual and Textual Inputs ← 분류기가 이미지를 실시간으로 분석 및 평가 해야함.

Input: Image, Safety Policy

↓

LlavaGuard

↓

Output:

⇒ Overall safety rating(safe/unsafe)

A specific safety category

Rationale that explains ‘why’ the content is deemed unsafe according to the given policy

⇒ Flexibility handle a broad spectrum of policies

Dataset: 3242 safe / 2224 unsafe ←→ 4571 train / 71 eval / 824 test(카테고리, rating에 대해 균형을 이룸-골고루)

Outputs

→ 할 수 있는 일

데이터 감사: 대규모 데이터셋 내에서 유해콘텐츠를 자동으로 식별해 분류. 안전하지 않은 콘텐츠가 모델 학습에 사용되는 것을 방지하는데 기여.
생성모델 검열: Text-to-Image 모델이 생성하는 이미지의 안전성을 실시간으로 평가해, 유해한 이미지가 제공되는 것을 가드(Guard)함.