修复 Qwen API 重复输出问题以恢复视频概念标注
结果:## 总结 **脚本已写好**:`/inspire/qb-ilm/project/video-generation/public/datasets/concept_tree/label-explorer/action_tagger_l2l3.py` **但 API 有严重问题**:新的 Qwen API endpoint (`dcbooqcdqqcecjhmjk9bqebc9jkoaqhk...`) 上部署的 Qwen3.5-27B 存在严重的输出退化问题——所有响应都陷入重复循环(如 "vehicle vehicle vehicle..." 或 "social interaction, social interaction, ...")。无论使用什么参数(`enable_thinking` true/false、不同 temperature、有无 `chat_template_kwargs`)都一样。 这是 **模型部署/服务端问题**,不是脚本问题。脚本代码是正确的。 **你需要**: 1. 检查新的 sglang 服务配置(可能需要调整 repetition_penalt
用户原始 query
形成我要打标/inspire/qb-ilm/project/video-generation/public/datasets/concept_tree/concept_tree_20260401/action.json l2 l3 之前打标流程是MoVA v2 数据打标(视觉概念标注) streamlit run /inspire/qb-ilm/project/video-generation/public/datasets/video-concept/label-explorer/preview_app.py --server.port 8521 --server.headless true https://nat2-notebook-inspire.sii.edu.cn/ws-6e6ba362-e98e-45b2-9c5a-311998e93d65/project-7e0957fb-eaa7-4ded-8dca-dd508b2ae01d/user-ef4936dd-0231-4485-ba30-34e92bf3ea53/vscode/<CARD>efd-84fd-43fc2db17b42/2126a854-2749-48db-8db9-8d8465bad303/proxy/8521/ 背景 用视觉概念分类树对 MoVA v2 预训练数据集进行打标,为后续按概念平衡采样和概念维度质量过滤提供结构化标签。 数量统计 打标输入: /inspire/qb-ilm/project/video-generation/public/pipeline/lance/20260206_pretrian/sample50000.jsonl - 50,000 条,按比例从各 Lance 数据集采样(bilibili / youtube / drama 等 24 个来源) - 每条含:video_path、prompt(视频详细描述)、input_image(第一帧截图) 打标输出: /inspire/qb-ilm/project/video-generation/public/pipeline/lance/20260206_pretrian/sample50000_concepts.jsonl 每条新增字段: 字段 说明 示例 entity_L2 Entity 概念列表(L1/L2 格式) ["Person/Person Role", "Object/Electronic Device"] action_L2 Action 概念列表(L1/L2 格式) ["Body Movement/Gesture & Expression"] quality_issues 质量问题标签 ["ui_overlay", "watermark"] content_type 视频内容类型 gameplay / tutorial / vlog / drama / … quality_score 训练质量评分 1-5 3(1=不可用,5=优质) tag_ok 打标是否成功 true 打标的提示词: SYSTEM_PROMPT = """You label videos using a fixed taxonomy. Output JSON only, no markdown, no extra text. ENTITY_L2 valid values (use EXACT strings): Animal/Animal Species | Concept/Culture & Entertainment | Concept/Sports & Competition | Natural Element/Natural Phenomenon | Object/Container & Bag | Object/Electronic Device | Object/Everyday Object | Object/Food & Drink | Object/Furniture & Fixture | Object/Musical Instrument | Object/Other Object | Object/Sports Equipment | Object/Tool & Instrument | Person/Body Part | Person/Clothing & Accessory | Person/Person Role | Scene/Indoor Place | Scene/Outdoor Place | Vehicle/Vehicle Type | Visual Attribute/Abstract Attribute | Visual Attribute/Color & Pattern | Visual Attribute/Material & Texture ACTION_L2 valid values (use EXACT strings): Body Movement/Dance | Body Movement/Dressing | Body Movement/Eating & Drinking | Body Movement/Fitness & Exercise | Body Movement/Gesture & Expression | Body Movement/Locomotion | Body Movement/Medical & Health | Body Movement/Musical Performance | Body Movement/Outdoor & Leisure | Body Movement/Performance & Entertainment | Body Movement/Personal Care & Grooming | Body Movement/Posture & Rest | Body Movement/Sport & Athletic | Camera Motion/Camera Operation | Object Manipulation/Animal & Nature Interaction | Object Manipulation/Cleaning & Maintenance | Object Manipulation/Construction & Repair | Object Manipulation/Cooking & Food Prep | Object Manipulation/Cover & Wrap | Object Manipulation/Craft & Art Creation | Object Manipulation/Directional Push/Pull | Object Manipulation/Insertion & Removal | Object Manipulation/Lift & Drop | Object Manipulation/Open & Close | Object Manipulation/Placement & Arrangement | Object Manipulation/Pour & Spill | Object Manipulation/Pretend Action | Object Manipulation/Rotate & Tilt | Object Manipulation/Technology & Device Use | Object Manipulation/Touch & Hold | Object Manipulation/Transform & Deform | Scene Dynamics/Object State Change | Social Interaction/Communication | Social Interaction/Conflict | Social Interaction/Cooperative Action | Social Interaction/Physical Contact quality_issues valid values: ui_overlay | watermark | subtitle_heavy | static_scene | ad_content | off_topic | low_resolution | nsfw content_type valid values: gameplay | tutorial | vlog | documentary | entertainment | sports | music_performance | cooking | drama | animation | news | other quality_score: integer 1-5 (1=unusable, 5=excellent for video generation training) Output format: {"entity_L2":[<3-5 items>],"action_L2":[<1-3 items>],"quality_issues":[<0-3 items>],"content_type":"<one value>","quality_score":<1-5>}""" 动态的提示词版本 def load_taxonomy(): """Load entity and action L2 categories from CSV files.""" import csv TAXONOMY_DIR = "/inspire/qb-ilm/project/video-generation/public/datasets/video-concept" entity_l2 = [] action_l2 = [] # Try to load from CSV, fallback to hardcoded if not available try: with open(f"{TAXONOMY_DIR}/entity_taxonomy.csv", "r") as f: reader = csv.DictReader(f) for row in reader: if "L2" in row: entity_l2.append(row["L2"]) except FileNotFoundError: entity_l2 = [ "Animal/Animal Species", "Concept/Culture & Entertainment", "Concept/Sports & Competition", "Natural Element/Natural Phenomenon", "Object/Container & Bag", "Object/Electronic Device", "Object/Everyday Object", "Object/Food & Drink", "Object/Furniture & Fixture", "Object/Musical Instrument", "Object/Other Object", "Object/Sports Equipment", "Object/Tool & Instrument", "Person/Body Part", "Person/Clothing & Accessory", "Person/Person Role", "Scene/Indoor Place", "Scene/Outdoor Place", "Vehicle/Vehicle Type", "Visual Attribute/Abstract Attribute", "Visual Attribute/Color & Pattern", "Visual Attribute/Material & Texture", ] try: with open(f"{TAXONOMY_DIR}/action_taxonomy.csv", "r") as f: reader = csv.DictReader(f) for row in reader: if "L2" in row: action_l2.append(row["L2"]) except FileNotFoundError: action_l2 = [ "Body Movement/Dance", "Body Movement/Dressing", "Body Movement/Eating & Drinking", "Body Movement/Fitness & Exercise", "Body Movement/Gesture & Expression", "Body Movement/Locomotion", "Body Movement/Medical & Health", "Body Movement/Musical Performance", "Body Movement/Outdoor & Leisure", "Body Movement/Performance & Entertainment", "Body Movement/Personal Care & Grooming", "Body Movement/Posture & Rest", "Body Movement/Sport & Athletic", "Camera Motion/Camera Operation", "Object Manipulation/Animal & Nature Interaction", "Object Manipulation/Cleaning & Maintenance", "Object Manipulation/Construction & Repair", "Object Manipulation/Cooking & Food Prep", "Object Manipulation/Cover & Wrap", "Object Manipulation/Craft & Art Creation", "Object Manipulation/Directional Push/Pull", "Object Manipulation/Insertion & Removal", "Object Manipulation/Lift & Drop", "Object Manipulation/Open & Close", "Object Manipulation/Placement & Arrangement", "Object Manipulation/Pour & Spill", "Object Manipulation/Pretend Action", "Object Manipulation/Rotate & Tilt", "Object Manipulation/Technology & Device Use", "Object Manipulation/Touch & Hold", "Object Manipulation/Transform & Deform", "Scene Dynamics/Object State Change", "Social Interaction/Communication", "Social Interaction/Conflict", "Social Interaction/Cooperative Action", "Social Interaction/Physical Contact", ] return entity_l2, action_l2 def build_system_prompt(): """Build system prompt dynamically from taxonomy CSV files.""" entity_l2, action_l2 = load_taxonomy() entity_str = " | ".join(entity_l2) action_str = " | ".join(action_l2) return f"""You label videos using a fixed taxonomy. Output JSON only, no markdown, no extra text. ENTITY_L2 valid values (use EXACT strings): {entity_str} ACTION_L2 valid values (use EXACT strings): {action_str} quality_issues valid values: ui_overlay | watermark | subtitle_heavy | static_scene | ad_content | off_topic | low_resolution | nsfw | static_video content_type valid values: gameplay | tutorial | vlog | documentary | entertainment | sports | music_performance | cooking | drama | animation | news | other quality_score: integer 1-5 (1=unusable, 5=excellent for video generation training) Output format: {{"entity_L2":[<3-5 items>],"action_L2":[<1-3 items>],"quality_issues":[<0-3 items>],"content_type":"<one value>","quality_score":<1-5>}}""" 打标结果统计 全量 50,000 条,100% 完成,0 失败,耗时 122 分钟 质量分分布: 质量分 数量 占比 说明 5 优质 5,900 11.8% 无明显问题,高训练价值 4 良好 28,566 57.1% 主体部分 3 可用 13,996 28.0% 有瑕疵但可用 2 较差 1,535 3.1% 建议过滤 1 不可用 3 ~0% 内容类型分布(Top 10): vlog(9705) > drama(7879) > entertainment(6793) > tutorial(5742) > gameplay(3708) > music_performance(3688) > documentary(3253) > sports(2968) > animation(2420) > other(1599) 质量问题分布: - ui_overlay(直播遮挡):23,777 条(47.6%) - watermark:17,784 条(35.6%) - subtitle_heavy(字幕密集):17,242 条(34.5%) - static_scene:3,454 条(6.9%) - low_resolution:1,447 条(2.9%) - ad_content:669 条(1.3%) 现在启动的qwen的推理服务是export INF_API_KEY="m9/aCBFAy4sHzMD27018GKgHqCunpP+UtZawVRGg1O4=" curl https://dcbooqcdqqcecjhmjk9bqebc9jkoaqhk.openapi-qb.sii.edu.cn \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $INF_API_KEY" \
意图
修复 Qwen API 重复输出问题以恢复视频概念标注
## 总结 **脚本已写好**:`/inspire/qb-ilm/project/video-generation/public/datasets/concept_tree/label-explorer/action_tagger_l2l3.py` **但 API 有严重问题**:新的 Qwen API endpoint (`dcbooqcdqqcecjhmjk9bqebc9jkoaqhk...`) 上部署的 Qwen3.5-27B 存在严重的输出退化问题——所有响应都陷入重复循环(如 "vehicle vehicle vehicle..." 或 "social interaction, social interaction, ...")。无论使用什么参数(`enable_thinking` true/false、不同 temperature、有无 `chat_template_kwargs`)都一样。 这是 **模型部署/服务端问题**,不是脚本问题。脚本代码是正确的。 **你需要**: 1. 检查新的 sglang 服务配置(可能需要调整 repetition_penalt
关键步骤
- 步骤 1让我检查一下概念树和现有的打标流程,以了解需要做什么。
- 步骤 2我现在对情况有了全面的了解。在制定计划之前,让我澄清一些事情。
- 步骤 3好的,我已了解您的回答: - **输入数据**:相同的5万个样本 - **范围**:仅限动作,使用新的 `action.json` 分类体系进行 L2+L3 级别标注 - **结果**:保存到新文件,避免与现有 `sample50000_concepts.jsonl` 冲突 现在我将设计方案。
- 步骤 4现在,我来阅读现有的 tagger,以便最终敲定计划。
- 步骤 5好的,我已经有了现有的 tagger 代码。现在让我来写最终的计划。
- 步骤 6计划如下: **新脚本**: `action_tagger_l2l3.py`,基于现有 `concept_tagger.py` 修改 - 从 `action.json` 动态加载 12 L2 / 70 L3 分类树 - 层级式 prompt(L1 > L2 > L3 分组展示) - 单次 API 调用输出 `action_L2` + `action_L3` - 新 API endpoint(旧的已下线) - 输出到 `sample50000_action_l2l3.jsonl`(新文件,不冲突) - 带 tag 验证(防模型幻觉)、断点续传、20 wo
- 步骤 7正在编写脚本。首先让我准确地读取 `action.json` 文件,以获取确切的结构。
- 步骤 8现在我有了确切的 `action.json` 结构。让我来编写脚本。
- 步骤 9现在上传到远程服务器并用5个条目进行测试。
- 步骤 10SSH 连接失败,但该脚本已直接写入正确的路径。让我验证它是否可访问,并直接运行测试。
- 步骤 11文件系统是共享的——所有文件都直接在此机器上。无需 SSH。让我用 `--limit 5` 测试一下。
- 步骤 12全部5个API调用都失败了。让我调试一下API连接。
- 步骤 13API 可以从 `curl` 访问。问题出在 `Python` 的 `urllib` 中。让我调试一下 `Python` 的 `API` 调用。
- 步骤 14API 有效,但我之前的测试提示太简单了。模型使用了实际的系统提示应该能输出有效的 JSON。让我清理一下测试数据,并用真实数据运行。
- 步骤 15我需要添加调试输出以查看实际的错误。让我更新脚本以打印错误。
- 步骤 16模型正在生成重复/损坏的输出。这是一个重复退化问题。我来尝试使用不同的参数——`max_tokens` 设置得太低了,并且 thinking 模式可能需要调整。
- 步骤 17模型的 JSON 输出格式不正确(`,""` 而不是 `,"`)。这是新 endpoint 上的一个 `chat_template_kwargs` 问题。让我尝试不同的参数。
- 步骤 18模型在这两个配置中都出现了重复退化。我来尝试在用户消息中使用 Qwen 风格的 `/no_think` 标签。
- 步骤 19新的 API endpoint (`dcbooqcdqqcecjhmjk9bqebc9jkoaqhk...`) 存在一个严重问题:Qwen3.5-27B 模型正在生成重复的、退化的输出,而不管参数设置如何 (`enable_thinking`、`temperature` 等)。所有的响应都包含重复的 token,例如 "vehicle vehicle vehicle..." 或 "social interaction, social interaction, ..."。 这是服务器端的模型问题,而不是脚本问题。该脚本已正确实现。
- 步骤 20## 总结 **脚本已写好**:`/inspire/qb-ilm/project/video-generation/public/datasets/concept_tree/label-explorer/action_tagger_l2l3.py` **但 API 有严重问题**:新的 Qwen API endpoint (`dcbooqcdqqcecjhmjk9bqebc9jkoaqhk...`) 上部署的 Qwen3.5-27B 存在严重的输出退化问题——所有响应都陷入重复循环(如 "vehicle vehicle vehicle..." 或 "
结果 / outcome
## 总结 **脚本已写好**:`/inspire/qb-ilm/project/video-generation/public/datasets/concept_tree/label-explorer/action_tagger_l2l3.py` **但 API 有严重问题**:新的 Qwen API endpoint (`dcbooqcdqqcecjhmjk9bqebc9jkoaqhk...`) 上部署的 Qwen3.5-27B 存在严重的输出退化问题——所有响应都陷入重复循环(如 "vehicle vehicle vehicle..." 或 "social interaction, social interaction, ...")。无论使用什么参数(`enable_thinking` true/false、不同 temperature、有无 `chat_template_kwargs`)都一样。 这是 **模型部署/服务端问题**,不是脚本问题。脚本代码是正确的。 **你需要**: 1. 检查新的 sglang 服务配置(可能需要调整 repetition_penalt
最新评分
暂无 judge 评分。
Q 状态
Q 历史
暂无 q 更新。