news 2026/6/15 17:51:11

VISTA-Bench Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
VISTA-Bench Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text

VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text?

Authors:Qing’an Liu, Juntong Feng, Yuhao Wang, Xinzhe Han, Yujie Cheng, Yue Zhu, Haiwen Diao, Yunzhi Zhuge, Huchuan Lu

Deep-Dive Summary:
Error: PDF not downloaded. Cannot generate detailed summary.

Original Abstract:Vision-Language Models (VLMs) have achieved impressive performance in cross-modal understanding across textual and visual inputs, yet existing benchmarks predominantly focus on pure-text queries. In real-world scenarios, language also frequently appears as visualized text embedded in images, raising the question of whether current VLMs handle such input requests comparably. We introduce VISTA-Bench, a systematic benchmark from multimodal perception, reasoning, to unimodal understanding domains. It evaluates visualized text understanding by contrasting pure-text and visualized-text questions under controlled rendering conditions. Extensive evaluation of over 20 representative VLMs reveals a pronounced modality gap: models that perform well on pure-text queries often degrade substantially when equivalent semantic content is presented as visualized text. This gap is further amplified by increased perceptual difficulty, highlighting sensitivity to rendering variations despite unchanged semantics. Overall, VISTA-Bench provides a principled evaluation framework to diagnose this limitation and to guide progress toward more unified language representations across tokenized text and pixels. The source dataset is available at https://github.com/QingAnLiu/VISTA-Bench.

PDF Link:2602.04802v1

部分平台可能图片显示异常,请以我的博客内容为准

版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/6/15 13:19:08

Agent Skills 完全指南:从原理到实战彻底搞懂!

Agent Skills 最近非常的火,它是既 MCP 后 Anthropic 推出的又一个 Agent 领域的行业标准。 它的成长路线和 MCP 也非常像,25 年 10 月份发布时只有 Anthropic 自家产品支持,后来 Cursor、Codex、Opencode、Gemini CLI 等产品看到了 Skills …

作者头像 李华
网站建设 2026/6/9 2:32:04

Linux 下 malloc 内存分配机制详解

在 Linux 系统中,malloc() 是 C 语言标准库(glibc)提供的动态内存分配函数。虽然它看起来只是一个简单的 API,但其底层实现涉及虚拟内存管理、多线程并发控制、性能优化等多个操作系统核心机制。本文将系统讲解 Linux 下 malloc 的…

作者头像 李华
网站建设 2026/6/13 21:11:36

告别低效数据处理:JBoltAI如何赋能Java企业智能化

在Java企业的日常运营中,数据查询与结构化处理始终是绕不开的核心环节。业务人员需要从数据库中提取销售、库存等关键数据时,往往需要依赖开发人员编写SQL语句;大量非结构化的合同、发票、报表数据,也需要人工整理成JSON格式才能对…

作者头像 李华
网站建设 2026/6/15 16:40:26

看完就会:继续教育专用的降AI率工具,千笔AI VS 知文AI

在AI技术快速发展的今天,越来越多的学生和研究者开始借助AI工具辅助论文写作,提升效率、优化内容。然而,随着学术审查标准的不断提高,AI生成内容的痕迹逐渐被识别,AI率超标成为影响论文通过的重要隐患。面对查重系统对…

作者头像 李华
网站建设 2026/6/15 13:06:49

centos 7.9 ISO下载链接

最近在一台DELL台式机上安装rocky linux 9.5和rocky linux 8.10,都出现了错误。 而采用centos 7.9没有问题。 我是用DVD安装的,但是安装完成之后做KVM的环境还需要ISO。 下面是ISO的链接: https://vault.centos.org/7.9.2009/isos/x86_64/

作者头像 李华