WebAccessVL: Making an Accessible Web via Violation-Conditioned VLM

Department of Computer Science, Purdue University

Abstract: We present a vision-language model (VLM) that automatically edits website HTML to address Web Content Accessibility Guidelines 2 (WCAG2) violations. We formulate this as a supervised image-conditioned program synthesis task, where the model learns to correct HTML given the HTML and its rendering. We collected WebAccessVL, a new dataset with manually corrected accessibility violations, establishing paired training data. We then propose a violation-conditioned VLM that additionally conditions on the WCAG2 violation count to guide the correction process. Experiments demonstrate that our method effectively reduces the average number of violations from 5.34 to 0.44 per website, outperforming commercial LLM APIs (Gemini, GPT-5). A perceptual study confirms that our edited websites maintain the original visual appearance and content.

Introduction

Web accessibility is crucial for ensuring that websites are usable by people with disabilities, yet many websites fail to meet the Web Content Accessibility Guidelines (WCAG) 2.0 standards. Manual correction of accessibility violations is time-consuming and requires specialized knowledge. In this work, we present WebAccessVL, a vision-language model that automatically edits website HTML to address WCAG2 violations.

Our approach formulates accessibility correction as a supervised image-conditioned program synthesis task. The model learns to correct HTML given both the HTML source code and its visual rendering, enabling it to understand both the structural and visual aspects of web pages. We collected WebAccessVL, a new dataset with manually corrected accessibility violations, establishing paired training data for this task.

We propose a violation-conditioned VLM that additionally conditions on the WCAG2 violation count to guide the correction process. This conditioning helps the model focus on the specific accessibility issues that need to be addressed, leading to more targeted and effective corrections.

WebAccessVL Dataset

To address the lack of publicly available datasets for training models to refine webpage HTML to follow accessibility guidelines, we collected WebAccessVL, a comprehensive dataset of paired HTML files with manually corrected accessibility violations.

2,500
Webpage HTMLs
7-10 min
Average annotation time per webpage
26
Violation categories
35.8%
Vision-related violations

Dataset Construction: We randomly sampled 2,500 websites from a large-scale HTML dataset, ensuring all essential assets (images, icons) are available and saved locally for reproducibility. Each HTML was manually corrected by an annotator with advanced computer science expertise, who modified the code and repeatedly rendered each webpage to ensure visual design consistency with the original version. We used IBM's industrial-grade accessibility checker to minimize violations to the best of their ability.

Violation Distribution

Our analysis reveals that 35.8% of violations involve vision-related factors requiring visual understanding, while 64.2% are purely language-based. This distribution highlights the importance of incorporating visual information in addition to HTML source code.

Text Contrast Sufficient
76.63%
Text-to-background contrast ratio less than 3:1, requiring visual understanding to fix while maintaining design consistency.
Image Alt Valid
15.90%
Missing alternative text for images. Vision information is needed to generate descriptions matching image content.
Style Color Misuse
6.49%
Only color differences used to mark required fields, requiring visual indicators like asterisks.
ARIA Content in Landmark
25.08%
Missing role or subsection specification for assistive technologies.
HTML Lang Exists
23.66%
No language specified on the website, needed for speech synthesizers.
Page Title Exists
23.66%
Missing <head> and <title> elements for assistive technology access.

Our Method: Violation-Conditioned VLM Pipeline

Pipeline Figure
Figure 1: Our violation-conditioned VLM pipeline for web accessibility enhancement. The model takes HTML and its visual rendering as input, conditions on the WCAG2 violation count, and generates corrected HTML that addresses accessibility issues while maintaining visual fidelity.

Experimental Results

Our experiments demonstrate the effectiveness of WebAccessVL in reducing accessibility violations while maintaining visual fidelity:

Violation Reduction

5.34 → 0.44
Average violations per website
Our method reduces the average number of WCAG2 violations from 5.34 to 0.44 per website, achieving a 92% reduction rate.

Baseline Comparison

Outperforms
Commercial LLM APIs
WebAccessVL outperforms commercial LLM APIs including Gemini Pro, GPT-4o, GPT-5, and Claude 3.5 in reducing accessibility violations.

Visual Fidelity

Maintained
Original appearance
A perceptual study confirms that our edited websites maintain the original visual appearance and content, ensuring that accessibility improvements don't compromise design.
Input HTML
Loading...
Our Method BEST
Loading...
GPT-4o
Loading...
GPT-5
Loading...
Claude 3.5
Loading...
Gemini Pro
Loading...

Code Diff - Our Method vs Baselines

Our Method vs GPT-4o


                    

Our Method vs GPT-5


                    

Our Method vs Claude 3.5


                    

Our Method vs Gemini Pro