|
| |||||||||||||
ALVR 2026 : 4th Workshop on Advances in Language and Vision Research | |||||||||||||
| Link: https://alvr-workshop.github.io/ | |||||||||||||
| |||||||||||||
Call For Papers | |||||||||||||
|
Co-located with ACL 2026 in San Diego, California, United States & Online | July 2–7, 2026
Contact email: alvr_workshop_acl_2026@googlegroups.com # About the Workshop Language & Vision research has rapidly evolved in recent years, driven by the emergence of large vision-language models (LVLMs). The 4th Workshop on Advances in Language and Vision Research (ALVR) will bring together researchers to explore the frontier of multimodal learning, foster collaboration, and identify pressing open challenges. We will feature keynote talks, spotlight paper presentations, poster sessions, and a panel discussion. # Important Dates (tentative) - Direct Submission deadline: March 5, 2026 - Pre-reviewed (ARR) commitment deadline: March 24, 2026 - Notification of acceptance: April 28, 2026 - Camera-ready paper due: May 12, 2026 - Workshop date: July 2nd or 3rd, 2026 All deadlines are 11.59 pm UTC -12h (anywhere on earth). # Submission Information We accept the following types of submissions: - Long papers: up to 8 pages (+ references and appendix) - Short papers: up to 4 pages (+ references and appendix) Final versions will be given one additional page of content so that reviewers' comments can be taken into account. Authors will have the option to provide a link to the relevant arXiv paper. We are also including a non-archival track to allow dual submission of work to ALVR 2026 and other conferences/journals. Space permitting, these submissions will still participate and present their work in the workshop and will be hosted on the workshop website but will not be included in the official proceedings. Please apply the ACL format and submit through openreview but indicate that this is a cross-submission (non-archival) at the bottom of the submission form. # Topics of Interest This workshop covers (but is not limited to) the following topics: Self-supervised vision and language pre-training; New tasks and datasets that provide real-world solutions in language and vision; Text-to-image/video generation and text-guided image/video editing; 3D/Spatial reasoning and inference with language and vision; Multimodal agents and Language-grounded embodied agents; Visually-grounded natural language understanding and generation; Culturally-aware LVLMs and LVLMs for underrepresented cultures; Multilingual LVLMs; External knowledge integration in visual and language understanding; Shortcomings of the existing LVLMs on downstream tasks and solutions; Training efficiency and optimization of LVLMs; Post-training frameworks for LVLMs, including alignment and reasoning; Ethics and bias on LVLMs; Multidisciplinary study that may involve linguistics, cognitive science, robotics, etc; Practical applications of LVLMs; Explainability and interpretability on LVLMs. # Organizers Qianqi (Jackie) Yan (UC Santa Barbara) Syrielle Montariol (CNRS, Sorbonne Université) Yue Fan (UC Santa Cruz) Jing Gu (xAI) Jiayi Pan (xAI) Manling Li (Northwestern University) Parisa Kordjamshidi (Michigan State University) Alane Suhr (UC Berkeley) Xin Eric Wang (UC Santa Barbara) |
|