智谱多模态系列：GLM-4.5V 环境配置与本地部署

沉寂了很长时间，在2025年8月，智谱终于开源了其升级版的视觉语言大模型GLM-4.5V，该模型基于文本基座模型 GLM-4.5-Air（106B参数，12B激活），延续 GLM-4.1V-Thinking 技术路线，在多个公开视觉多模态榜单中综合效果达到同级别开源模型 SOTA 性能，涵盖图像、视频、文档理解以及 GUI Agent 等常见任务。

模型在真实场景下的表现与可用性也不错，GLM-4.5V通过高效混合训练，可以处理不同的视觉理解和推理任务，比如：

图像推理：场景理解、复杂多图分析、位置识别
复杂图表与长文档解析：研报分析、信息提取
Grounding 能力：精准定位视觉元素
视频理解：长视频分镜分析、事件识别
GUI 任务：屏幕读取、图标识别、桌面操作辅助

同时，模型新增 “思考模式” 开关，用户可灵活选择快速响应或深度推理，平衡效率与效果

# 1、创建虚拟环境 conda create -name sglang_env python=3.10 conda activate sglang_env # 2、安装相关库和依赖 pip3 install "sglang[all]>=0.5.0rc1" pip install git+https://github.com/huggingface/transformers.git

采用SGLang进行本地服务化，代码如下：

python3 -m sglang.launch_server --model-path zai-org/GLM-4.5V \ --tp-size 4 \ --tool-call-parser glm45 \ --reasoning-parser glm45 \ --served-model-name glm-4.5v \ --port 8000 --host 0.0.0.0

需要注意以下事项：

以H100为例，需要至少4张H100来支持满血版推理服务；需要显式指定：EXPORT_CUDA_VISIBLE_DEVICES=0,1,2,3。
如果是部署GLM-4.5V-FP8，则可减少相应的显存资源。
SGLang 框架建议使用 FA3 注意力后端，支持更高的推理性能和更低的显存占用，可添加 –attention-backend fa3 –mm-attention-backend fa3 –enable-torch-compile开启。
使用SGLang时，发送请求时默认启用思考模式。如果要禁用思考开关，需要添加 extra_body={“chat_template_kwargs”: {“enable_thinking”: False}}参数。

API 调用脚本：

文本+单图

from openai import OpenAI openai_api_key = "EMPTY" openai_api_base = "http://127.0.0.1:8000/v1" client = OpenAI( api_key=openai_api_key, base_url=openai_api_base, ) response = client.chat.completions.create( model="glm-4.5v", messages=[ { "role": "user", "content": [ {"type": "text", "text": "请描述这张图片的主要内容"}, {"type": "image_url", "image_url": {"url": "file:///your_img"}} ] } ], max_tokens=512, temperature=0.0, ) print(response.choices[0].message.content.strip()) reasoning_content = getattr(response.choices[0].message, "reasoning_content", None) print("======") print(reasoning_content.strip() if reasoning_content else "No reasoning_content field in response.")

文本+多图

from openai import OpenAI openai_api_key = "EMPTY" openai_api_base = "http://127.0.0.1:8000/v1" client = OpenAI( api_key=openai_api_key, base_url=openai_api_base, ) response = client.chat.completions.create( model="glm-4.5v", messages=[ { "role": "user", "content": [ {"type": "text", "text": "请比较这两张图片的主要不同点"}, {"type": "image_url", "image_url": {"url": "file:///your_img1"}}, {"type": "image_url", "image_url": {"url": "file:///your_img2"}} ] } ], max_tokens=512, temperature=0.0, ) print(response.choices[0].message.content.strip()) reasoning_content = getattr(response.choices[0].message, "reasoning_content", None) print("======") print(reasoning_content.strip() if reasoning_content else "No reasoning_content field in response.")

资源有限的情况下，可以选择量化版本GLM-4.5V-FP8，本地推理demo如下：

from transformers import AutoProcessor, Glm4vMoeForConditionalGeneration #AutoModelForConditionalGeneration from PIL import Image import requests import torch # Load model and processor model_id = "zai-org/GLM-4.5V-FP8" model = Glm4vMoeForConditionalGeneration.from_pretrained( model_id, torch_dtype="auto", device_map="auto", trust_remote_code=True ) processor = AutoProcessor.from_pretrained(model_id, trust_remote_code 智谱 AI GLM 教程=True) # Example image loading (replace with your image path or URL) image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg" image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB") # Prepare the prompt prompt = "Describe this car in detail." messages = [ {"role": "user", "content": [{"type": "image", "image": image}, {"type": "text", "text": prompt}]} ] # Apply chat template and preprocess image input_ids = processor.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt") pixel_values = processor.preprocess_images(image, return_tensors="pt") # Generate response with torch.no_grad(): output_ids = model.generate( input_ids.to(model.device), pixel_values=pixel_values.to(model.device), max_new_tokens=512 ) response = processor.decode(output_ids[0], skip_special_tokens=True) print(response)

发布者：Ai探索者，转载请注明出处：https://javaforall.net/270342.html原文链接：https://javaforall.net

智谱多模态系列：GLM-4.5V 环境配置与本地部署

关于作者

Ai探索者网站注册用户

智谱多模态系列：GLM-4.5V 环境配置与本地部署

关于作者

Ai探索者网站注册用户

相关推荐

智谱的阳谋：深度解析GLM-4.5V开源及其对AI Agent王座的争夺

智谱发布免费Agent产品AutoGLM沉思

智谱AI GLM系列模型与LobeChat完美融合方案

应对Anthropic停服 智谱推出「Claude API 用户特别搬家计划」

智谱GLM-5加持，GLM Coding Pro用户现已可用，国产算力加速AI编程

Clawdbot安装部署教程来了，给自己搭个7×24小时AI助理

应对Anthropic停服智谱推出「Claude API 用户特别搬家计划」