Abstract: We present a vision-language model (VLM) that automatically edits website HTML to address Web Content Accessibility Guidelines 2 (WCAG2) violations. We formulate this as a supervised image-conditioned program synthesis task, where the model learns to correct HTML given the HTML and its rendering. We collected WebAccessVL, a new dataset with manually corrected accessibility violations, establishing paired training data. We then propose a violation-conditioned VLM that additionally conditions on the WCAG2 violation count to guide the correction process. Experiments demonstrate that our method effectively reduces the average number of violations from 5.34 to 0.44 per website, outperforming commercial LLM APIs (Gemini, GPT-5). A perceptual study confirms that our edited websites maintain the original visual appearance and content.
Web accessibility is crucial for ensuring that websites are usable by people with disabilities, yet many websites fail to meet the Web Content Accessibility Guidelines (WCAG) 2.0 standards. Manual correction of accessibility violations is time-consuming and requires specialized knowledge. In this work, we present WebAccessVL, a vision-language model that automatically edits website HTML to address WCAG2 violations.
Our approach formulates accessibility correction as a supervised image-conditioned program synthesis task. The model learns to correct HTML given both the HTML source code and its visual rendering, enabling it to understand both the structural and visual aspects of web pages. We collected WebAccessVL, a new dataset with manually corrected accessibility violations, establishing paired training data for this task.
We propose a violation-conditioned VLM that additionally conditions on the WCAG2 violation count to guide the correction process. This conditioning helps the model focus on the specific accessibility issues that need to be addressed, leading to more targeted and effective corrections.
To address the lack of publicly available datasets for training models to refine webpage HTML to follow accessibility guidelines, we collected WebAccessVL, a comprehensive dataset of paired HTML files with manually corrected accessibility violations.
Dataset Construction: We randomly sampled 2,500 websites from a large-scale HTML dataset, ensuring all essential assets (images, icons) are available and saved locally for reproducibility. Each HTML was manually corrected by an annotator with advanced computer science expertise, who modified the code and repeatedly rendered each webpage to ensure visual design consistency with the original version. We used IBM's industrial-grade accessibility checker to minimize violations to the best of their ability.
Our analysis reveals that 35.8% of violations involve vision-related factors requiring visual understanding, while 64.2% are purely language-based. This distribution highlights the importance of incorporating visual information in addition to HTML source code.
Our experiments demonstrate the effectiveness of WebAccessVL in reducing accessibility violations while maintaining visual fidelity: