2026-01-01 22:57:55 +08:00
|
|
|
|
# 心镜 Agent - WebSocket后端实现
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
> 实现2.1异常状态触发对话和2.3双向音频流对话的WebSocket接口
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
## 快速启动
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
### 1. 安装依赖
|
|
|
|
|
|
```bash
|
|
|
|
|
|
uv add websockets # 已安装
|
|
|
|
|
|
```
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
### 2. 启动WebSocket服务器
|
|
|
|
|
|
```bash
|
|
|
|
|
|
python src/MainServices.py
|
2026-01-01 17:48:45 +08:00
|
|
|
|
```
|
2026-01-01 22:57:55 +08:00
|
|
|
|
|
|
|
|
|
|
服务器将在 `ws://0.0.0.0:8765` 启动
|
|
|
|
|
|
|
|
|
|
|
|
### 3. 测试接口
|
|
|
|
|
|
```bash
|
|
|
|
|
|
python test_ws.py
|
2026-01-01 17:48:45 +08:00
|
|
|
|
```
|
|
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
### 4. 查看完整API文档
|
|
|
|
|
|
参考 [WEBSOCKET_API.md](./WEBSOCKET_API.md)
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
---
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
## 2. Agent对话接口(WebSocket)
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
**WebSocket连接**: `ws://0.0.0.0:8765`
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
### 2.1 用户状态异常状态触发对话
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
**接口描述**: K230检测到皮肤状态差或悲伤情绪时,触发Agent主动关怀对话(然后agent端拼接提示词给出合适的语音回答! )
|
|
|
|
|
|
|
|
|
|
|
|
**K230 → Agent后端**:
|
|
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"type": "abnormal_trigger", // 类型:异常状态触发对话
|
|
|
|
|
|
"trigger_reason": "string", // 触发原因,可选值:["poor_skin", "sad_emotion"]
|
|
|
|
|
|
"enable_streaming": true, // 是否启用流式响应,布尔值
|
|
|
|
|
|
"context_data": { // 可选,上下文数据
|
|
|
|
|
|
"emotion": "sad",
|
|
|
|
|
|
"skin_status": {
|
|
|
|
|
|
"acne": true,
|
|
|
|
|
|
"dark_circles": true
|
|
|
|
|
|
},
|
|
|
|
|
|
"timestamp": "2024-01-01 12:30:45"
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
2026-01-01 17:48:45 +08:00
|
|
|
|
```
|
|
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
**字段说明**:
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
- `type`: 固定值 "abnormal_trigger",表示异常状态触发
|
|
|
|
|
|
- `trigger_reason`: 触发原因
|
|
|
|
|
|
|
|
|
|
|
|
- "poor_skin": 皮肤状态差
|
|
|
|
|
|
- "sad_emotion": 悲伤情绪
|
|
|
|
|
|
|
|
|
|
|
|
- `enable_streaming`: 是否使用流式对话(推荐为true)
|
|
|
|
|
|
- `context_data`: 提供给Agent的上下文信息
|
|
|
|
|
|
|
|
|
|
|
|
**Agent后端 → K230(响应)**: (然后开始音频录制以及音频播放流式接口,主逻辑交给agent端! )
|
|
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"type": "abnormal_trigger_response",
|
|
|
|
|
|
"success": true,
|
|
|
|
|
|
}
|
2026-01-01 17:48:45 +08:00
|
|
|
|
```
|
|
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
------
|
|
|
|
|
|
|
|
|
|
|
|
### 2.2 用户主动发起对话 (现在先不管,不管不管)
|
|
|
|
|
|
|
|
|
|
|
|
**接口描述**: 用户通过唤醒词(如"你好啊"、"心镜")主动发起对话
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
**K230 → Agent后端**:
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"type": "user_initiated", // 类型:用户主动发起对话
|
|
|
|
|
|
"wake_word": "你好啊", // 触发的唤醒词
|
|
|
|
|
|
"enable_streaming": true, // 是否启用流式响应
|
|
|
|
|
|
"user_input": "string", // 可选,用户的初始输入内容
|
|
|
|
|
|
"timestamp": "2024-01-01 12:30:45"
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
**字段说明**:
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
- `type`: 固定值 "user_initiated"
|
|
|
|
|
|
- `wake_word`: 检测到的唤醒词("你好啊"、"心镜"等)
|
|
|
|
|
|
- `enable_streaming`: 是否启用流式对话
|
|
|
|
|
|
- `user_input`: 用户的初始问题或陈述(可选)
|
|
|
|
|
|
- `timestamp`: 唤醒时间
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
**Agent后端 → K230(响应)**:(然后开始音频录制以及音频播放流式接口,主逻辑交给agent端! )
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"type": "user_initiated_response",
|
|
|
|
|
|
"success": true,
|
|
|
|
|
|
}
|
2026-01-01 17:48:45 +08:00
|
|
|
|
```
|
|
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
------
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
### 2.3 双向音频流对话
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
**接口描述**: K230和Agent后端通过同一WebSocket连接实现实时音频双向传输
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
**连接建立后握手参数**:
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"type": "audio_stream_init", // 类型:音频流初始化
|
|
|
|
|
|
"session_id": "string", // 对话会话ID(来自2.1或2.2)
|
|
|
|
|
|
"audio_config": {
|
|
|
|
|
|
"sample_rate": 16000, // 采样率,单位Hz(如16000、48000)
|
|
|
|
|
|
"bit_depth": 16, // 位宽,单位bit(如16、24)
|
|
|
|
|
|
"channels": 1, // 声道数(1=单声道,2=立体声)
|
|
|
|
|
|
"encoding": "pcm" // 音频编码格式(pcm、opus等)
|
|
|
|
|
|
},
|
|
|
|
|
|
"timestamp": "2024-01-01 12:30:45"
|
|
|
|
|
|
}
|
2026-01-01 17:48:45 +08:00
|
|
|
|
```
|
|
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
**Agent后端 → K230(握手响应)**:
|
|
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"type": "audio_stream_init_response",
|
|
|
|
|
|
"success": true,
|
|
|
|
|
|
"message": "音频流连接已建立",
|
|
|
|
|
|
"timestamp": "2024-01-01 12:30:45"
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
**K230 → Agent后端(上行音频流)**:
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"type": "audio_stream_upload", // 消息类型:上传音频流数据
|
|
|
|
|
|
"session_id": "string", // 会话ID
|
|
|
|
|
|
"data": "base64-encoded-audio", // base64编码的音频数据
|
|
|
|
|
|
"timestamp": "2024-01-01 12:30:45",
|
|
|
|
|
|
"sequence": 1 // 序列号,用于排序
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
**Agent后端 → K230(下行音频流)**:
|
|
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"type": "audio_stream_download", // 消息类型:Agent语音响应
|
|
|
|
|
|
"session_id": "string", // 会话ID
|
|
|
|
|
|
"data": "base64-encoded-audio", // base64编码的音频数据
|
|
|
|
|
|
"timestamp": "2024-01-01 12:30:46",
|
|
|
|
|
|
"is_final": false, // 是否为最后一个音频片段
|
|
|
|
|
|
"text": "string" // 可选,对应的文字内容
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
**连接控制消息**:
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"type": "audio_stream_control", // 类型:音频流控制
|
|
|
|
|
|
"session_id": "string",
|
|
|
|
|
|
"action": "string", // 控制动作:["pause", "resume", "end"]
|
|
|
|
|
|
"reason": "string", // 可选,操作原因
|
|
|
|
|
|
"timestamp": "2024-01-01 12:30:47"
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
**字段说明**:
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
- `sample_rate`: 音频采样率,建议16000Hz
|
|
|
|
|
|
- `bit_depth`: 音频位深度,建议16bit
|
|
|
|
|
|
- `channels`: 声道数,建议单声道(1)
|
|
|
|
|
|
- `encoding`: 音频编码,建议PCM或opus
|
|
|
|
|
|
- `sequence`: 音频包序列号,确保顺序
|
|
|
|
|
|
- `is_final`: 标识Agent是否说完
|
|
|
|
|
|
- `action`: 控制动作
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
- "pause": 暂停音频流
|
|
|
|
|
|
- "resume": 恢复音频流
|
|
|
|
|
|
- "end": 结束音频流
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
------
|
2026-01-01 17:48:45 +08:00
|
|
|
|
|
2026-01-01 22:57:55 +08:00
|
|
|
|
##
|