Files
GenericAgent/README.md
2026-02-07 08:45:59 +08:00

105 lines
6.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# PC-Agent-Loop: High-Performance Autonomous PC Controller
[English](#english) | [中文说明](#chinese)
<a name="english"></a>
PC-Agent-Loop is a minimalist yet powerful autonomous agent framework designed to bridge Large Language Models with direct OS-level execution. Unlike traditional chatbots, it possesses "physical" agency—the ability to perceive its environment, reason about complex goals, and execute multi-step operations across the file system, browsers, and local applications.
## 🚀 Evolutionary Intelligence & Extensibility
This agent is not limited to a fixed set of features. Its true power lies in its ability to **autonomously discover environment-specific capabilities** and **manufacture its own tools**:
- **Self-Discovery via Long-Term Memory**:
- The agent maintains a "Global Memory" (L2 Facts) to store system paths, credentials, and environmental status.
- It can autonomously retrieve context-aware SOPs (Standard Operating Procedures) to handle specialized tasks like Instant Messaging (IM) database recovery or Gmail API operations.
- **Dynamic Tool Manufacturing**:
- Through `code_run`, the agent can write and execute arbitrary Python scripts to interface with new hardware or software.
- Examples of self-integrated capabilities include:
- **Deep Web Interaction**: JS injection via Tampermonkey for UI automation.
- **Digital Forensics**: Querying SQLCipher-encrypted databases (e.g., encrypted local storage of IM apps).
- **Vision-Driven Logic**: Understanding UI states through local vision APIs (`ask_vision`).
- **System Indexing**: Utilizing **Everything CLI (es.exe)** for instant file discovery across the entire OS.
- **Android Automation**: ADB-based control for mobile device interaction.
## 📂 Project Architecture
- `agent_loop.py`: The core "Sense-Think-Act" engine (under 100 lines) driving the autonomous cycle.
- `ga.py`: The fundamental atomic toolset (File, Web, Code, User interaction).
- `agentapp.py` & `launch.pyw`: A Streamlit-based graphical interface and persistent launcher.
- `sidercall.py`: Robust LLM session management supporting multiple backends and model switching.
## 🛠️ Usage Examples
### 1. Autonomous Environment Adaptation
"Scan my local memory for recent SOPs regarding mail processing, then find and download my latest reimbursement receipts from Gmail."
### 2. Complex Multi-Step Automation
"Locate a specific encrypted IM database, decrypt it to find messages about 'Project X', and summarize the findings into a PDF report."
### 3. Real-Time System Intervention
"Monitor my cloud dashboard via the browser; if the status turns red, execute a local PowerShell script to restart the service and notify me."
## 🧩 Atomic Toolset (The Primitives)
The agent achieves high-level goals by orchestrating these 7 primitive actions:
1. `code_run`: The ultimate "Swiss Army Knife" for executing Python/PowerShell.
2. `web_scan`: Semantic perception of live web pages and tabs.
3. `web_execute_js`: Direct physical interaction with web DOM elements.
4. `file_read` & `file_write`: Direct disk access and file management.
5. `file_patch`: Safe, block-level code modification to evolve its own scripts.
6. `ask_user`: Bridging the gap for human decision-making or sensitive credentials.
7. `conclude_and_reflect`: The mechanism for distilling experiences into long-term memory.
---
<a name="chinese"></a>
# PC-Agent-Loop: 高性能 PC 级自主 AI Agent
pc-agent-loop 是一个极致简约的 PC 级自主 AI Agent 框架。它通过不到 100 行的核心引擎代码,构筑了对浏览器、终端和文件系统的物理级自动化能力。
## 🚀 进化智能与扩展性
本 Agent 不局限于预设功能。其核心优势在于能够**自主发现环境特定能力**并**制造属于自己的工具**
- **基于长期记忆的自我发现**:
- Agent 维护“全局记忆”L2 Facts以存储系统路径、凭据和环境状态。
- 能够自主检索上下文相关的 SOP标准作业程序以处理即时通讯软件IM数据库恢复、Gmail API 操作等专业任务。
- **动态工具制造**:
- 通过 `code_run`Agent 可以编写并执行 Python/PowerShell 脚本来对接新硬件或软件。
- **自集成能力示例**:
- **深度 Web 自动化**: 通过 Tampermonkey 进行 JS 注入实现 UI 自动化。
- **数字取证**: 查询 SQLCipher 加密的数据库(如加密的本地 IM 数据库)。
- **视觉驱动逻辑**: 通过本地视觉 API (`ask_vision`) 理解 UI 状态。
- **系统全盘索引**: 利用 **Everything CLI (es.exe)** 实现毫秒级文件检索。
- **安卓自动化**: 基于 ADB 控制移动设备交互。
## 📂 项目结构
- `agent_loop.py`: 核心引擎,负责“感知-思考-行动”的自主循环逻辑。
- `ga.py`: 工具箱,定义了原子工具的具体实现。
- `agentapp.py` & `launch.pyw`: 基于 Streamlit 的交互界面与持久化启动器。
- `sidercall.py`: LLM 通信层,支持多后端切换。
## 🛠️ 典型使用场景
1. **环境自适应**: “扫描我的本地记忆寻找邮件处理 SOP然后从 Gmail 下载最新的报销收据。”
2. **跨模块协作**: “定位特定的加密 IM 数据库并解密,查找关于‘项目 X的消息并汇总成 PDF 报告。”
3. **系统干预**: “监控云端控制台,若状态异常则执行本地脚本重启服务并邮件通知我。”
## 🧩 7 大核心原子工具
1. `code_run`: 终极工具,执行 Python/PowerShell 脚本。
2. `web_scan`: 网页与标签页的语义化感知。
3. `web_execute_js`: 物理级网页操控(点击、滚动、数据提取)。
4. `file_read` & `file_write`: 磁盘文件直接访问。
5. `file_patch`: 安全的源码级局部修改。
6. `ask_user`: 关键决策或凭据输入时的人机协作。
7. `conclude_and_reflect`: 将执行经验提炼进长期记忆的机制。
## ⚠️ 警告
本 Agent 具备执行本地代码和控制操作系统的**物理权限**。请务必在受信任的环境中运行。
---
*Note: This README was autonomously generated and refined by the Agent.*