#
Qwen2
.
5
–VL
–7B
–Instruct模型服务化
实践:
FastAPI
封装+Swagger文档+
鉴
权集成 1
. 项目背景
与价值 在实际工程应用中,直接将大模型部署为本地工具虽然方便,但存在诸多限制:无法多用户共享、缺乏
标准化
接口、难以集成到现有系统中。将
Qwen2
.
5
–VL
–7B
–Instruct模型进行服务化
封装,可以解决这些问题。 服务化后的模型能够提供统一的
API
接口,支持多用户并发访问,方便
与其他系统集成,同时通过Swagger文档让
接口使用更加透明,通过
鉴
权机制保障服务安全。这种部署方式特别适合团队协作、产品集成和规模化应用场景。 基于
FastAPI框架的服务化方案具有诸多优势:异步高性能、自动生成
API文档、类型检查、依赖注入等特性,让模型服务的开发和维护更加高效。 2
. 环境准备
与依赖安装 2
.1 基础环境要求 确保系统满足以下要求:
– Ubuntu 18
.04+ 或 CentOS 7+
– Python
3
.8
–
3
.10
– CUDA 11
.7+ 和 cuDNN 8+
– NVIDIA驱动程序兼容RTX 4090
– 至少
50GB可用磁盘空间 2
.2 创建虚拟环境 bash # 创建并激活虚拟环境 python
–m venv
qwen_service_env source
qwen_service_env/bin/activate # 安装核心依赖 pip install
fastapi uvicorn python
–multipart pip install transformers torch torchvision pip install python
–jose[cryptography] passlib[bcrypt] pip install python
–multipart aiofiles 2
.
3 模型准备 将
Qwen2
.
5
–VL
–7B
–Instruct模型放置在指定目录: bash mkdir
–p /app/models/
qwen2
.
5
–vl
–7b
–instruct # 将模型文件拷贝至此目录
3
.
FastAPI服务核心实现
3
.1 服务架构设计 我们采用分层架构设计:
– 路由层:处理HTTP请求和响应
– 千问 Qwen 教程 服务层:核心业务逻辑处理
– 模型层:模型加载和推理
– 工具层:辅助功能(
鉴
权、日志等)
3
.2 主应用文件结构 创建主应用文件 `main
.py`: python from
fastapi import
FastAPI, File, UploadFile, HTTPException, Depends, status from
fastapi
.security import OAuth2PasswordBearer, OAuth2PasswordRequestForm from
fastapi
.middleware
.cors import CORSMiddleware from
fastapi
.open
api
.docs import get_swagger_ui_html from
fastapi
.responses import JSONResponse from typing import List, Optional import uvicorn import os import logging from datetime import datetime, timedelta from jose import JWTError, jwt from passlib
.context import CryptContext from pydantic import BaseModel import aiofiles # 导入自定义模块 from models
.inference import MultiModalInference from models
.auth import AuthHandler # 初始化应用 app =
FastAPI( title=”
Qwen2
.
5
–VL
–7B
–Instruct
API服务”, description=”基于
Qwen2
.
5
–VL
–7B
–Instruct
多模态大模型的
API服务,支持图文混合交互”, version=”1
.0
.0″ ) # 添加CORS中间件 app
.add_middleware( CORSMiddleware, allow_origins=[“*”], allow_credentials=True, allow_methods=[“*”], allow_headers=[“*”], ) # 全局变量 model_instance = None auth_handler = AuthHandler() # 数据模型定义 class User(BaseModel)
: username
: str disabled
: Optional[bool] = None class UserInDB(User)
: hashed_password
: str class Token(BaseModel)
: access_token
: str token_type
: str class InferenceRequest(BaseModel)
: text_input
: str image_path
: Optional[str] = None class InferenceResponse(BaseModel)
: result
: str processing_time
: float model_version
: str
3
.
3 模型推理模块 创建 `models/inference
.py`: python import torch from transformers import AutoModelForCausalLM, AutoTokenizer from PIL import Image import time import logging logger = logging
.getLogger(__name__) class MultiModalInference
: def __init__(self, model_path
: str)
: self
.model_path = model_path self
.model = None self
.tokenizer = None self
.device = “cuda” if torch
.cuda
.is_available() else “cpu” self
.load_model() def load_model(self)
: “””加载模型和分词器””” try
: logger
.info(f”开始加载模型
: {self
.model_path}”) start_time = time
.time() self
.tokenizer = AutoTokenizer
.from_pretrained( self
.model_path, trust_remote_code=True ) self
.model = AutoModelForCausalLM
.from_pretrained( self
.model_path, torch_dtype=torch
.float16, device_map=”auto”, trust_remote_code=True, use_flash_attention_2=True # 启用Flash Attention 2优化 ) load_time = time
.time()
– start_time logger
.info(f”模型加载完成,耗时
: {load_time
:
.2f}秒”) except Exception as e
: logger
.error(f”模型加载失败
: {str(e)}”) # 回退到标准推理模式 try
: logger
.info(“尝试使用标准模式加载
.
.
.“) self
.model = AutoModelForCausalLM
.from_pretrained( self
.model_path, torch_dtype=torch
.float16, device_map=”auto”, trust_remote_code=True ) logger
.info(“标准模式加载成功”) except Exception as fallback_error
: logger
.error(f”标准模式也加载失败
: {str(fallback_error)}”) raise fallback_error def process_image(self, image_path
: str, max_size
: int = 1024)
: “””处理输入图像””” try
: image = Image
.open(image_path)
.convert(“RGB”) # 调整图像大小以防止显存溢出 if max(image
.size) > max_size
: ratio = max_size / max(image
.size) new_size = tuple(int(dim * ratio) for dim in image
.size) image = image
.resize(new_size, Image
.Resampling
.LANCZOS) return image except Exception as e
: logger
.error(f”图像处理失败
: {str(e)}”) raise def inference(self, text_input
: str, image_path
: str = None)
: “””执行推理””” try
: start_time = time
.time() messages = [ {“role”
: “user”, “content”
: []} ] if image_path
: image = self
.process_image(image_path) messages[0][“content”]
.append({“image”
: image_path}) messages[0][“content”]
.append({“text”
: text_input}) # 准备模型输入 text = self
.tokenizer
.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) # 生成响应 with torch
.no_grad()
: inputs = self
.tokenizer(text, return_tensors=”pt”)
.to(self
.device) outputs = self
.model
.generate( inputs, max_new_tokens=1024, do_sample=True, temperature=0
.7, top_p=0
.9 ) response = self
.tokenizer
.decode( outputs[0][inputs
.input_ids
.shape[1]
:], skip_special_tokens=True ) processing_time = time
.time()
– start_time logger
.info(f”推理完成,耗时
: {processing_time
:
.2f}秒”) return response, processing_time except Exception as e
: logger
.error(f”推理过程出错
: {str(e)}”) raise
3
.4
鉴
权模块实现 创建 `models/auth
.py`: python from datetime import datetime, timedelta from jose import JWTError, jwt from passlib
.context import CryptContext from
fastapi import HTTPException, status, Depends from
fastapi
.security import OAuth2PasswordBearer import os # 密钥配置(生产环境应从环境变量或配置文件中读取) SECRET_KEY = os
.getenv(“SECRET_KEY”, “your
–secret
–key
–change
–in
–production”) ALGORITHM = “HS2
56″ ACCESS_TOKEN_EXPIRE_MINUTES =
30 # 示例用户数据库(生产环境应使用真实数据库) fake_users_db = { “admin”
: { “username”
: “admin”, “hashed_password”
: “$2b$12$EixZaYVK1fsbw1ZfbX
3OXePaWxn96p
36WQoeG6Lruj
3vjPGga
31lW”, # secret “disabled”
: False, } } pwd_context = CryptContext(schemes=[“bcrypt”], deprecated=”auto”) oauth2_scheme = OAuth2PasswordBearer(tokenUrl=”token”) class AuthHandler
: def verify_password(self, plain_password, hashed_password)
: return pwd_context
.verify(plain_password, hashed_password) def get_password_hash(self, password)
: return pwd_context
.hash(password) def get_user(self, username
: str)
: if username in fake_users_db
: user_dict = fake_users_db[username] return user_dict def authenticate_user(self, username
: str, password
: str)
: user = self
.get_user(username) if not user
: return False if not self
.verify_password(password, user[“hashed_password”])
: return False return user def create_access_token(self, data
: dict, expires_delta
: timedelta = None)
: to_encode = data
.copy() if expires_delta
: expire = datetime
.utcnow() + expires_delta else
: expire = datetime
.utcnow() + timedelta(minutes=1
5) to_encode
.update({“exp”
: expire}) encoded_jwt = jwt
.encode(to_encode, SECRET_KEY, algorithm=ALGORITHM) return encoded_jwt async def get_current_user(self, token
: str = Depends(oauth2_scheme))
: credentials_exception = HTTPException( status_code=status
.HTTP_401_UNAUTHORIZED, detail=”Could not validate credentials”, headers={“WWW
–Authenticate”
: “Bearer”}, ) try
: payload = jwt
.decode(token, SECRET_KEY, algorithms=[ALGORITHM]) username
: str = payload
.get(“sub”) if username is None
: raise credentials_exception except JWTError
: raise credentials_exception user = self
.get_user(username) if user is None
: raise credentials_exception return user async def get_current_active_user(self, current_user
: dict = Depends(get_current_user))
: if current_user
.get(“disabled”)
: raise HTTPException(status_code=400, detail=”Inactive user”) return current_user 4
.
API路由设计
与实现 4
.1
鉴
权路由 python # 在main
.py中添加以下路由 @app
.post(“/token”, response_model=Token) async def login_for_access_token(form_data
: OAuth2PasswordRequestForm = Depends())
: user = auth_handler
.authenticate_user(form_data
.username, form_data
.password) if not user
: raise HTTPException( status_code=status
.HTTP_401_UNAUTHORIZED, detail=”Incorrect username or password”, headers={“WWW
–Authenticate”
: “Bearer”}, ) access_token_expires = timedelta(minutes=ACCESS_TOKEN_EXPIRE_MINUTES) access_token = auth_handler
.create_access_token( data={“sub”
: user[“username”]}, expires_delta=access_token_expires ) return {“access_token”
: access_token, “token_type”
: “bearer”} 4
.2 模型推理路由 python @app
.post(“/inference”, response_model=InferenceResponse) async def run_inference( request
: InferenceRequest, current_user
: dict = Depends(auth_handler
.get_current_active_user) )
: “”” 执行
多模态推理
– text_input
: 文本输入
– image_path
: 图像文件路径(可选) “”” try
: if not model_instance
: raise HTTPException( status_code=status
.HTTP_
500_INTERNAL_SERVER_ERROR, detail=”Model not loaded” ) result, processing_time = model_instance
.inference( request
.text_input, request
.image_path ) return InferenceResponse( result=result, processing_time=processing_time, model_version=”
Qwen2
.
5
–VL
–7B
–Instruct” ) except Exception as e
: raise HTTPException( status_code=status
.HTTP_
500_INTERNAL_SERVER_ERROR, detail=f”Inference error
: {str(e)}” ) @app
.post(“/inference/upload”) async def upload_image_and_inference( file
: UploadFile = File(
.
.
.), text_input
: str = Form(
.
.
.), current_user
: dict = Depends(auth_handler
.get_current_active_user) )
: “”” 上传图像并执行推理
– file
: 图像文件
– text_input
: 文本输入 “”” try
: # 保存上传的文件 upload_dir = “uploads” os
.makedirs(upload_dir, exist_ok=True) file_path = os
.path
.join(upload_dir, file
.filename) async with aiofiles
.open(file_path, &#
39;wb&#
39;) as out_file
: content = await file
.read() await out_file
.write(content) # 执行推理 result, processing_time = model_instance
.inference(text_input, file_path) return InferenceResponse( result=result, processing_time=processing_time, model_version=”
Qwen2
.
5
–VL
–7B
–Instruct” ) except Exception as e
: raise HTTPException( status_code=status
.HTTP_
500_INTERNAL_SERVER_ERROR, detail=f”Inference error
: {str(e)}” ) 4
.
3 健康检查
与监控路由 python @app
.get(“/health”) async def health_check()
: “””服务健康检查””” return { “status”
: “healthy”, “model_loaded”
: model_instance is not None, “timestamp”
: datetime
.now()
.isoformat() } @app
.get(“/model
–info”) async def get_model_info(current_user
: dict = Depends(auth_handler
.get_current_active_user))
: “””获取模型信息””” if not model_instance
: raise HTTPException( status_code=status
.HTTP_
500_INTERNAL_SERVER_ERROR, detail=”Model not loaded” ) return { “model_name”
: ”
Qwen2
.
5
–VL
–7B
–Instruct”, “device”
: model_instance
.device, “model_path”
: model_instance
.model_path }
5
. 服务启动
与配置
5
.1 启动脚本 创建启动脚本 `start_service
.py`: python import uvicorn import logging from main import app, model_instance, auth_handler from models
.inference import MultiModalInference import os # 配置日志 logging
.basicConfig( level=logging
.INFO, format=&#
39;%(asctime)s
– %(name)s
– %(levelname)s
– %(message)s&#
39; ) def startup_event()
: “””启动事件处理””” try
: # 初始化模型 model_path = os
.getenv(“MODEL_PATH”, “/app/models/
qwen2
.
5
–vl
–7b
–instruct”) global model_instance model_instance = MultiModalInference(model_path) logging
.info(“服务启动完成,模型加载成功”) except Exception as e
: logging
.error(f”服务启动失败
: {str(e)}”) raise if __name__ == “__main__”
: # 注册启动事件 app
.add_event_handler(“startup”, startup_event) # 启动服务 uvicorn
.run( app, host=”0
.0
.0
.0″, port=8000, reload=False, # 生产环境设置为False workers=1 # 多worker可能导致显存冲突 )
5
.2 环境配置 创建 `
.env` 文件: bash # 模型路径 MODEL_PATH=/app/models/
qwen2
.
5
–vl
–7b
–instruct # 安全配置 SECRET_KEY=your
–very
–secure
–secret
–key
–change
–in
–production ACCESS_TOKEN_EXPIRE_MINUTES=
30 # 服务配置 HOST=0
.0
.0
.0 PORT=8000 LOG_LEVEL=INFO
5
.
3 Docker部署配置 创建 `Dockerfile`: dockerfile FROM nvidia/cuda
:11
.8
–runtime
–ubuntu22
.04 # 设置工作目录 WORKDIR /app # 安装系统依赖 RUN apt
–get update && apt
–get install
–y python
3
.10 python
3
–pip python
3
.10
–venv && rm
–rf /var/lib/apt/lists/* # 复制项目文件 COPY requirements
.txt
. COPY
.
. # 安装Python依赖 RUN pip install
–
–no
–cache
–dir
–r requirements
.txt # 创建模型目录 RUN mkdir
–p /app/models # 暴露端口 EXPOSE 8000 # 启动命令 CMD [“python”, “start_service
.py”] 创建 `docker
–compose
.yml`: yaml version
: &#
39;
3
.8&#
39; services
:
qwen
–
api
: build
:
. ports
:
– “8000
:8000″ environment
:
– MODEL_PATH=/app/models/
qwen2
.
5
–vl
–7b
–instruct
– SECRET_KEY=your
–production
–secret
–key volumes
:
–
./models
:/app/models
–
./uploads
:/app/uploads deploy
: resources
: reservations
: devices
:
– driver
: nvidia count
: 1 capabilities
: [gpu] restart
: unless
–stopped 6
. 使用指南
与
API测试 6
.1 启动服务 bash # 直接启动 python start_service
.py # 或使用Docker docker
–compose up
–d 服务启动后,可以通过以下方式访问:
–
API文档:http
://localhost
:8000/docs
– 健康检查:http
://localhost
:8000/health 6
.2 获取访问令牌 首先需要获取访问令牌: bash curl
–X POST “http
://localhost
:8000/token”
–H “Content
–Type
: application/x
–www
–form
–urlencoded”
–d “username=admin&password=secret” 响应示例: json { “access_token”
: “eyJhbGciOiJIUzI1NiIsInR
5cCI6IkpXVCJ9
.
.
.“, “token_type”
: “bearer” } 6
.
3
API调用示例 # 文本推理: bash curl
–X POST “http
://localhost
:8000/inference”
–H “Authorization
: Bearer YOUR_ACCESS_TOKEN”
–H “Content
–Type
: application/json”
–d &#
39;{ “text_input”
: “描述一下这张图片的内容”, “image_path”
: “/path/to/your/image
.jpg” }&#
39; # 上传图片推理: bash curl
–X POST “http
://localhost
:8000/inference/upload”
–H “Authorization
: Bearer YOUR_ACCESS_TOKEN”
–F “file=@/path/to/your/image
.jpg”
–F “text_input=描述这张图片的内容” 6
.4 Python客户端示例 python import requests import json class
QwenClient
: def __init__(self, base_url, username, password)
: self
.base_url = base_url self
.token = self
._get_token(username, password) self
.headers = {“Authorization”
: f”Bearer {self
.token}”} def _get_token(self, username, password)
: response = requests
.post( f”{self
.base_url}/token”, data={“username”
: username, “password”
: password} ) return response
.json()[“access_token”] def inference(self, text_input, image_path=None)
: if image_path
: # 上传文件方式 with open(image_path, &#
39;rb&#
39;) as f
: files = {&#
39;file&#
39;
: f} data = {&#
39;text_input&#
39;
: text_input} response = requests
.post( f”{self
.base_url}/inference/upload”, files=files, data=data, headers=self
.headers ) else
: # JSON方式 payload = {“text_input”
: text_input} if image_path
: payload[“image_path”] = image_path response = requests
.post( f”{self
.base_url}/inference”, json=payload, headers=self
.headers ) return response
.json() # 使用示例 client =
QwenClient(“http
://localhost
:8000″, “admin”, “secret”) result = client
.inference(“描述这张图片”, “path/to/image
.jpg”) print(result) 7
. 总结 通过本文的
实践,我们成功将
Qwen2
.
5
–VL
–7B
–Instruct模型
封装为标准的
API服务,具备了以下特性: 核心功能完善:
– 完整的图文
多模态推理能力
– 高效的Flash Attention 2优化
– 自动图像预处理和显存保护机制 服务化特性:
– RESTful
API
接口设计
– 自动生成的Swagger文档
– JWT令牌
鉴
权机制
– 健康检查和监控
接口 部署便捷性:
– Docker容器化部署
– 环境变量配置管理
– 生产就绪的配置优化 使用友好性:
– 详细的
API文档
– 多种调用方式支持
– 客户端代码示例 这种服务化方案使得
Qwen2
.
5
–VL
–7B
–Instruct模型可以轻松集成到各种应用中,为团队协作和产品开发提供了强有力的
多模态AI能力支持。开发者现在可以通过简单的
API调用来使用这个强大的视觉语言模型,而无需关心底层的模型加载和推理细节。
–
–
– > 获取更多AI镜像 > > 想探索更多AI镜像和应用场景?访问 [CSDN星图镜像广场](https
://ai
.csdn
.net/?utm_source=mirror_blog_end),提供丰富的预置镜像,覆盖大模型推理、图像生成、视频生成、模型微调等多个领域,支持一键部署。
发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/285007.html原文链接:https://javaforall.net
