调研视觉概念分类树在视频生成领域的相关文献
结果:完成视觉概念树文献综述。WebSearch在开发机不可用需arXiv API直连。
用aris的skill调研下'/inspire/ssd/project/video-generation/public/zzy/entity_tree/vis_concept_tree.py'里面对应的视觉tree相关文献
我先读取文件,以了解视觉概念树的实现,然后使用研究技能查找相关文献。
现在我也来检查一下目录中的 JSON 数据文件,以便理解实际的视觉概念树结构。
我现在理解了这个项目——它是一个用于视频生成的分层视觉概念分类系统,包含实体类别(人类、动物、物体、场景、动作、视觉风格),并带有用于可视化的交互式树形可视化工具。让我使用 ARIS 研究技能来查找相关文献。
Base directory for this skill: /inspire/ssd/project/video-generation/public/openveo3/tools/cc-setup/claude-config/skills/research-lit # Research Literature Review Research topic: Find related work on: (1) Visual concept taxonomy/hierarchy for video generation and text-to-video models; (2) Structured visual concept trees with entity categories (human, animal, object, scene, action, visual_style) used for organizing training data in generative video models; (3) Hierarchical concept organization
我现在将遵循研究文献的工作流程。让我首先检查本地来源,然后搜索外部数据库。
No local papers found, and no ARIS fetch scripts in the skill directory. Let me search for the scripts in broader locations and then proceed with web searches.
No local fetch scripts available. I'll proceed with WebSearch to find relevant papers. Let me search across multiple relevant topics in parallel.
The web searches returned empty results. Let me try more targeted searches with different query formulations.
WebSearch is returning empty results. Let me try using the arXiv API directly and also try different search approaches.
Good, arXiv API is working. I found a few relevant papers. Let me now do more targeted searches for the specific subtopics most relevant to this project.
好的。现在我来搜索一些更具体的领域——视频理解分类学、视觉本体、结构化提示视频生成以及数据集组织。
我找到了一些重要的论文。现在我来详细搜索我需要的关键论文——特别是 T2V-CompBench、VidGen-1M、LLM-Grounded Video Diffusion,以及一些更基础的论文。
现在让我再搜索几篇更具体的论文——特别是关于视频生成、概念分解和 T2V 评估基准中的组合性的论文。
Now let me also fetch the GenMAC and Sora survey papers' details, and search for a couple more important areas.
我现在已经收集了全面的搜索结果。接下来,我将整理最终的文献综述。 --- ## 视觉概念树相关文献综述 基于你的 `vis_concept_tree.py` 和 6 类实体树(human, animal, object, scene, action, visual_style)的项目,我通过 arXiv API 检索了相关文献,按主题分组如下: ### 一、组合性视频生成与概念分解 | Paper | Venue/Year | Method | Key Result | 与本项目的关系 | |-------|-----------|--------|------------|--------------| | **T2V-CompBench** (Sun et al., 2024) | arXiv 2407.14505 | 构建组合性 T2V 评测基准,将 prompt 拆分为 object/attribute/action/motion 维度 | 首个系统评估 T2V 模型组合能力的 benchmark | **最直接相关** — 其 object-attribute-ac