Update core files and tools schema

2026-01-17 00:15:14 +08:00
parent 9b20ca8297
commit 00e9d8f5e4
4 changed files with 104 additions and 118 deletions
--- a/README.md
+++ b/README.md
@@ -1,52 +1,44 @@
 # pc-agent-loop
-pc-agent-loop 是一个**极致简约**的 PC 级自主 AI Agent 框架。它通过不到 100 行的核心代码和约 200 行的工具实现，构筑了把整个pc给它（浏览器、终端、文件系统）的物理级自动化能力。
+> **全能，且危险；要么解决问题，要么解决掉系统。**
 pc-agent-loop 是一个极致简约的 PC 级自主 AI Agent 框架。它通过不到 100 行的核心引擎代码，构筑了对浏览器、终端和文件系统的物理级自动化能力。
 ## 🚀 核心特性
- **极简设计**: 仅由 **7 个基本工具** 和一个高效的 **Agentic Loop** 构成，拒绝过度设计。
+- **极简设计**: 仅由 7 个原子工具和一个高效的 Agentic Loop 构成。
- **自主代码执行 (Code Execution)**: 能够根据任务需求自主编写并运行 Python 或 PowerShell 脚本，直接操控系统资源。
+- **自主代码执行**: 能够根据任务需求自主编写并运行 Python 或 PowerShell 脚本，直接操控系统资源。
- **深度 Web 自动化 (Advanced Web Automation)**: 
+- **深度 Web 自动化**: 提供语义化网页扫描与 JS 注入执行，实现精准的浏览器控制。
-    - **语义化扫描**: 自动清洗 HTML 内容，将复杂的 DOM 转化为 AI 易读的结构。
+- **精准文件编辑**: 支持基于源码块匹配的 `file_patch` 功能。
-    - **JS 注入执行**: 在浏览器上下文中执行自定义 JavaScript，实现精准点击、滚动或数据抓取。
+- **人机协作**: 在关键决策点主动请求人类干预。
    - **TMWebDriver**: 支持通过 Tampermonkey 实现的持久化会话驱动。
 - **精准文件编辑 (Smart File Patching)**: 并非盲目覆盖，而是支持通过 `file_patch` 以代码块匹配方式进行精确修改。
 - **人机协作模式 (Human-in-the-loop)**: 在遇到验证码、关键权限或模糊决策时，主动请求用户介入。
 ## 📂 项目结构
- `agent_loop.py`: **核心引擎**，负责“感知-思考-行动”的自主循环逻辑。
+- `agent_loop.py`: 核心引擎，负责“感知-思考-行动”的自主循环逻辑。
- `ga.py`: **工具箱**，定义了 7 大核心原子工具的具体实现。
+- `ga.py`: 工具箱，定义了 7 大原子工具的具体实现。
- `agentapp.py`: 基于 Streamlit 构建的轻量化交互式 Web 界面。
+- `agentapp.py`: 基于 Streamlit 构建的交互式 Web 界面。
 - `sidercall.py`: LLM 通信层，支持流式输出与 API 调用。
 - `TMWebDriver.py`: 浏览器驱动模块（需配合 Tampermonkey 脚本使用）。
-## 🛠️ 快速开始
+## 🛠️ 如何启动
-### 1. 环境准备
+为了使 Agent 正常工作，你需要进行以下手动配置：
 - 安装 Python 3.8+。
 - （可选）若需网页自动化，请在浏览器中安装 **Tampermonkey** 插件并导入本项目提供的对应脚本。
-### 2. 安装依赖
+1.  **API Key 设置**: 在 `sidercall.py` 中设置你的 LLM API 访问 Key。
-缺啥装啥
+2.  **Session 修改**: 在 `agentapp.py` 的 `init` 方法中，根据需要修改使用的 `LLMSession` 实例。
-### 3. 启动应用
+配置完成后，在项目根目录下执行：
 在项目根目录下执行：
 ```bash
 python launch.pyw
 ```
 ## 🧩 7 大核心工具
-Agent 仅依靠以下 7 个原子工具的组合来完成复杂任务：
+Agent 仅依靠以下原子工具的组合来完成任务：
-
+`code_run`, `web_scan`, `web_execute_js`, `file_read`, `file_write`, `file_patch`, `ask_user`。
 1.  **`code_run`**: 针对 Windows 优化的双模态代码执行器（Python/PowerShell）。
 2.  **`web_scan`**: 获取网页清洗后的语义化 HTML 结构，支持多标签管理。
 3.  **`web_execute_js`**: 网页 JS 脚本注入，支持将结果存盘分析。
 4.  **`file_read`**: 分页式文件读取，支持行号定位。
 5.  **`file_write`**: 文件全量写入或追加。
 6.  **`file_patch`**: 基于源码块匹配的精准局部修改，确保缩进一致性。
 7.  **`ask_user`**: 关键节点请求人类干预。
 ---
-**⚠️ 警告**: 本 Agent 具备执行本地代码和控制操作系统的物理权限。请务必在受信任的环境中运行，并在运行前仔细检查 Agent 的执行意图。
+
 ### 📝 自动生成说明
 **特别说明**：本 `README.md` 文件、项目中的核心 Prompt 以及工具描述（Tools SCHEMA）完全由 Agent 自主生成并迭代优化。
 **⚠️ 警告**: 本 Agent 具备执行本地代码和控制操作系统的物理权限。请务必在受信任的环境中运行。
--- a/ga.py
+++ b/ga.py
@@ -298,25 +298,21 @@ class GenericAgentHandler(BaseHandler):
    def do_file_write(self, args, response):
        '''用于对整个文件的大量处理，精细修改要用file_patch。
        需要将要写入的内容放在<file_content>标签内，或者放在代码块中。
        '''
        path = self._get_abs_path(args.get("path", ""))
        mode = args.get("mode", "overwrite") 
        action_str = "Appending to" if mode == "append" else "Writing"
        yield f"\n[Action] {action_str} file: {os.path.basename(path)}\n"
-        def extract_intended_block(content):
+        def extract_robust_content(text):
-            start_marker = "```"
+            tag = re.search(r"<file_content>(.*?)</file_content>", text, re.DOTALL)
-            first_idx = content.find(start_marker)
+            if tag: return tag.group(1).strip()
-            last_idx = content.rfind(start_marker)
+            s, e = text.find("```"), text.rfind("```")
-            if first_idx == -1 or last_idx == -1 or first_idx == last_idx:
+            if -1 < s < e: return text[text.find("\n", s)+1 : e].strip()
            return None
            header_end = content.find("\n", first_idx)
            if header_end == -1 or header_end > last_idx:
                return None
            actual_content = content[header_end + 1 : last_idx].strip()
            return actual_content
-        blocks = extract_intended_block(response.content)
+        blocks = extract_robust_content(response.content)
        if not blocks:
            yield f"[Status] ❌ 失败: 未在回复中找到代码块内容\n"
            return StepOutcome({"status": "error", "msg": "No code block found in response"}, next_prompt="\n")
@@ -327,8 +323,8 @@ class GenericAgentHandler(BaseHandler):
            with open(path, write_mode, encoding="utf-8") as f:
                f.write(final_content)
            yield f"[Status] ✅ {mode.capitalize()} 成功 ({len(new_content)} bytes)\n"
-            return StepOutcome({"status": "success"}, 
+            return StepOutcome({"status": "success", 'writed_bytes': len(new_content)}, 
-                               next_prompt=f"\n提醒: <user_input>{self.user_input}</user_input>请继续执行下一步。\n")
+                               next_prompt=self._get_anchor_prompt())
        except Exception as e:
            yield f"[Status] ❌ 写入异常: {str(e)}\n"
            return StepOutcome({"status": "error", "msg": str(e)}, next_prompt="\n")
--- a/sidercall.py
+++ b/sidercall.py
@@ -111,7 +111,7 @@ class ToolClient:
            prompt += f"=== {role} ===\n{m['content']}\n\n"
        self.total_cd_tokens += len(prompt)
-        if self.total_cd_tokens > 6000: self.last_tools = ''
+        if self.total_cd_tokens > 9000: self.last_tools = ''
        prompt += "=== ASSISTANT ===\n" 
        return prompt
--- a/tools_schema.json
+++ b/tools_schema.json
@@ -3,7 +3,7 @@
    "type": "function",
    "function": {
      "name": "code_run",
-      "description": "针对 Windows 优化的双模态代码执行器。优先使用 python 运行复杂的脚本、逻辑和数据处理（需在回复中提供 ```python 代码块）；仅在必要系统操作（如文件管理、环境变量设置）时使用 powershell。注意：不要在代码中放置大量数据，如有需要应通过文件读取。代码逻辑必须包含在回复的消息体中。",
+      "description": "针对 Windows 优化的双模态代码执行器。优先使用 python 运行复杂逻辑，仅在必要系统操作时使用 powershell。注意：执行的代码必须以 ```python 或 ```powershell 代码块的形式包含在回复正文中。严禁在代码中硬编码大量数据，如有需要应通过文件读取。执行时间限制为 60s。",
      "parameters": {
        "type": "object",
        "properties": {
@@ -13,63 +13,18 @@
              "python",
              "powershell"
            ],
-            "description": "执行模式。python 用于逻辑运算，powershell 用于单行指令。"
+            "description": "执行环境类型，默认为 python。",
            "default": "python"
          },
          "timeout": {
            "type": "integer",
-            "default": 60,
+            "description": "执行超时时间（秒），默认 60。",
-            "description": "执行超时时间（秒）。"
+            "default": 60
          },
          "cwd": {
            "type": "string",
            "description": "工作目录，默认为当前工作目录。"
          }
        },
        "required": [
          "type"
        ]
      }
    }
  },
  {
    "type": "function",
    "function": {
      "name": "web_execute_js",
      "description": "浏览器控制的首选工具。通过执行 JavaScript 达成对网页的完全控制（如点击、滚动、提取特定数据）。支持将执行结果保存到文件供后续分析。注意：保存功能仅限即时读取，与 await 等异步操作不兼容。",
      "parameters": {
        "type": "object",
        "properties": {
          "script": {
            "type": "string",
            "description": "要执行的 JavaScript 代码。"
          },
          "save_to_file": {
            "type": "string",
            "description": "（可选）将 JS 返回结果保存到指定的文件路径。"
          }
        },
        "required": [
          "script"
        ]
      }
    }
  },
  {
    "type": "function",
    "function": {
      "name": "web_scan",
      "description": "获取网页的清洗后 HTML 内容。支持多标签页管理，可查看当前所有标签页并进行切换。应配合 execute_js 使用，减少全量观察 HTML 以提高效率。",
      "parameters": {
        "type": "object",
        "properties": {
          "focus_item": {
            "type": "string",
            "description": "语义过滤指令。在长列表中模糊搜寻相关项（如“搜索特定商品名称”），算法会优先保留匹配内容。"
          },
          "switch_tab_id": {
            "type": "string",
            "description": "可选的标签页 ID。如果提供，将先切换到该标签页再进行扫描。"
          }
        }
      }
    }
@@ -78,28 +33,28 @@
    "type": "function",
    "function": {
      "name": "file_read",
-      "description": "读取文件内容。支持分页读取以处理大文件，默认每页 100 行并带有行号，方便 file_patch 定位。",
+      "description": "读取文件内容。建议在修改文件前先读取，以确保获取最新的上下文和行号。支持分页读取，默认每次读取 100 行。",
      "parameters": {
        "type": "object",
        "properties": {
          "path": {
            "type": "string",
-            "description": "文件路径。"
+            "description": "文件相对或绝对路径。"
          },
          "start": {
            "type": "integer",
-            "default": 1,
+            "description": "起始行号（从 1 开始）。",
-            "description": "起始行号（从 1 开始）。"
+            "default": 1
          },
          "count": {
            "type": "integer",
-            "default": 100,
+            "description": "读取的行数。",
-            "description": "读取的行数。"
+            "default": 100
          },
          "show_linenos": {
            "type": "boolean",
-            "default": true,
+            "description": "是否显示行号，建议开启以辅助 file_patch 定位。",
-            "description": "是否显示行号。"
+            "default": true
          }
        },
        "required": [
@@ -112,21 +67,21 @@
    "type": "function",
    "function": {
      "name": "file_patch",
-      "description": "对文件进行精细的局部修改。通过寻找唯一的旧文本块并替换为新文本。注意：必须确保 old_content 在文件中是唯一的，且空格、缩进、换行必须与原文件完全一致。如果替换失败，请先用 file_read 确认文件内容。",
+      "description": "精细化局部文件修改。在文件中寻找唯一的 old_content 块并替换为 new_content。要求 old_content 必须在文件中唯一存在，且空格、缩进、换行必须与原文件完全一致。如果匹配失败，请使用 file_read 重新确认文件内容。",
      "parameters": {
        "type": "object",
        "properties": {
          "path": {
            "type": "string",
-            "description": "目标文件路径。"
+            "description": "文件路径。"
          },
          "old_content": {
            "type": "string",
-            "description": "要被替换的原始代码块（需确保唯一性）。"
+            "description": "文件中需要被替换的原始文本块（需确保唯一性）。"
          },
          "new_content": {
            "type": "string",
-            "description": "替换后的新代码块。"
+            "description": "替换后的新文本内容。"
          }
        },
        "required": [
@@ -141,13 +96,13 @@
    "type": "function",
    "function": {
      "name": "file_write",
-      "description": "用于对整个文件进行覆盖写入或追加。主要用于创建新文件或处理文件的大量变更。具体写入的内容必须以代码块（```）的形式包含在回复的消息体中。",
+      "description": "用于文件的新建、全量覆盖或追加写入。对于精细的代码修改，应优先使用 file_patch。注意：要写入的内容必须放在回复正文的 <file_content> 标签或代码块中。",
      "parameters": {
        "type": "object",
        "properties": {
          "path": {
            "type": "string",
-            "description": "目标文件路径。"
+            "description": "文件路径。"
          },
          "mode": {
            "type": "string",
@@ -155,8 +110,8 @@
              "overwrite",
              "append"
            ],
-            "default": "overwrite",
+            "description": "写入模式：overwrite（覆盖，默认）或 append（追加）。",
-            "description": "写入模式：overwrite（覆盖）或 append（追加）。"
+            "default": "overwrite"
          }
        },
        "required": [
@@ -165,21 +120,64 @@
      }
    }
  },
  {
    "type": "function",
    "function": {
      "name": "web_scan",
      "description": "获取当前网页的清洗后内容，并列出所有已打开的标签页。支持切换标签页。在长页面中，可以使用 focus_item 进行语义过滤以提取关键信息。",
      "parameters": {
        "type": "object",
        "properties": {
          "focus_item": {
            "type": "string",
            "description": "语义过滤指令，用于在长列表中优先保留与该关键词相关的项。"
          },
          "switch_tab_id": {
            "type": "string",
            "description": "可选的标签页 ID。如果提供，系统将在扫描前切换到该标签页。"
          }
        }
      }
    }
  },
  {
    "type": "function",
    "function": {
      "name": "web_execute_js",
      "description": "万能网页操控工具。通过执行 JavaScript 脚本实现对浏览器的完全控制（如点击、滚动、提取特定数据）。这是 Web 场景下的首选工具。执行结果可选择保存到本地文件进行后续分析。",
      "parameters": {
        "type": "object",
        "properties": {
          "script": {
            "type": "string",
            "description": "要执行的 JavaScript 代码。"
          },
          "save_to_file": {
            "type": "string",
            "description": "可选。将 JS 执行结果（js_return）保存到的文件路径。注意：该功能不支持 await 等异步结果。"
          }
        },
        "required": [
          "script"
        ]
      }
    }
  },
  {
    "type": "function",
    "function": {
      "name": "update_plan",
-      "description": "同步宏观任务进度与战略重心。仅在涉及多步逻辑的初始拆解或发生重大方针变更（原方案不可行）时调用。严禁用于记录细微的调试步骤。简单任务无需使用。",
+      "description": "更新任务的宏观计划和当前战略重心。仅在初始拆解多步任务或发生重大方案调整时使用。禁止用于记录细微调试步骤或纠错。",
      "parameters": {
        "type": "object",
        "properties": {
          "plan": {
            "type": "string",
-            "description": "更新后的宏观执行计划。"
+            "description": "完整的宏观任务路线图。"
          },
          "focus": {
            "type": "string",
-            "description": "当前阶段的战略重心。"
+            "description": "当前阶段的工作重点。"
          }
        }
      }
@@ -189,20 +187,20 @@
    "type": "function",
    "function": {
      "name": "ask_user",
-      "description": "当遇到无法自动决策、需要用户授权、需要用户提供私密信息或在关键节点需要确认时调用。调用后系统会暂停并等待人工介入。",
+      "description": "当需要用户决策、提供额外信息或遇到无法自动解决的阻碍时，调用此工具中断任务并提问。",
      "parameters": {
        "type": "object",
        "properties": {
          "question": {
            "type": "string",
-            "description": "向用户提出的问题或请求。"
+            "description": "向用户提出的明确问题。"
          },
          "candidates": {
            "type": "array",
            "items": {
              "type": "string"
            },
-            "description": "提供给用户的可选快捷选项。"
+            "description": "提供给用户的可选快捷选项列表。"
          }
        },
        "required": [