Gemini Computer Use — 技能 — openclaw中文资讯站

技能详情（站内镜像，无评论）

Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confi…

开发与 DevOps

许可证：MIT-0

MIT-0 ·免费使用、修改和重新分发。无需归因。

版本：v1.0.0

统计：⭐ 4 · 3.3k · 13 current installs · 13 all-time installs

⭐ 4

安装量（当前） 13

🛡 VirusTotal ：良性 · OpenClaw ：可疑

Package：am-will/gemini-computer-use

安全扫描（ClawHub）

VirusTotal ：良性
OpenClaw ：可疑

OpenClaw 评估

The skill's purpose (browser automation with Gemini Computer Use) is plausible and most of the code matches that purpose, but there are clear inconsistencies (registry says no env vars while the code requires GEMINI_API_KEY), a truncated/possibly-buggy script fragment, and privacy/operational risks (screenshots sent to an external model, browser automation can act on web pages) that you should understand before installing.

目的

The name/description (Gemini Computer Use browser-control agents) matches the included script and instructions: it uses Playwright and the Google GenAI client to run a screenshot → function_call → action → function_response loop. However the registry metadata claims 'Required env vars: none' while both the SKILL.md quickstart and the script require a GEMINI_API_KEY (and optionally COMPUTER_USE_BROWSER_CHANNEL / COMPUTER_USE_BROWSER_EXECUTABLE)…

说明范围

SKILL.md tells the user to set an API key and run the provided script. The runtime instructions and code capture full-page screenshots and send them (inline image/png parts) along with the user prompt to the external Gemini model (Google GenAI). This is expected for the skill's purpose, but it means screenshots (which may contain sensitive information) are transmitted off-host. The instructions also allow the model to emit function_call action…

安装机制

There is no automated install spec (instruction-only install). The SKILL.md instructs the user to create a virtualenv and pip install google-genai and playwright, then run 'playwright install chromium'. This is a standard, low-risk approach compared to bundled downloads from arbitrary URLs. The package includes a Python script; no external downloads or extract/install steps are declared in the skill bundle itself.

证书

The code legitimately requires GEMINI_API_KEY to call the Gemini Computer Use model and optionally COMPUTER_USE_BROWSER_CHANNEL and COMPUTER_USE_BROWSER_EXECUTABLE to control browser selection. Those env vars are proportional to the stated purpose. However the public registry metadata incorrectly lists no required env vars, which is misleading. Also note that transmitting screenshots to the external API is intrinsic to functionality but is a p…

持久

The skill is not always-enabled and does not request special platform privileges. The skill is allowed to be invoked autonomously (disable-model-invocation is false), which is the platform default; combined with broad browser control capabilities, autonomous invocation increases the blast radius (the agent could autonomously navigate, click, and type). SKILL.md does recommend running in a sandboxed profile or container. There is no evidence th…

安装（复制给龙虾 AI）

将下方整段复制到龙虾中文库对话中，由龙虾按 SKILL.md 完成安装。

请把本段交给龙虾中文库（龙虾 AI）执行：为本机安装 OpenClaw 技能「Gemini Computer Use」。简介：Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. U…。
请 fetch 以下地址读取 SKILL.md 并按文档完成安装：https://raw.githubusercontent.com/openclaw/skills/refs/heads/main/skills/am-will/gemini-computer-use/SKILL.md
（来源：yingzhi8.cn 技能库）

SKILL.md

打开原始 SKILL.md（GitHub raw）

---
name: gemini-computer-use
description: Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.
---

# Gemini Computer Use

## Quick start

1. Source the env file and set your API key:

   ```bash
   cp env.example env.sh
   $EDITOR env.sh
   source env.sh
   ```

2. Create a virtual environment and install dependencies:

   ```bash
   python -m venv .venv
   source .venv/bin/activate
   pip install google-genai playwright
   playwright install chromium
   ```

3. Run the agent script with a prompt:

   ```bash
   python scripts/computer_use_agent.py 
     --prompt "Find the latest blog post title on example.com" 
     --start-url "https://example.com" 
     --turn-limit 6
   ```

## Browser selection

- Default: Playwright's bundled Chromium (no env vars required).
- Choose a channel (Chrome/Edge) with `COMPUTER_USE_BROWSER_CHANNEL`.
- Use a custom Chromium-based executable (e.g., Brave) with `COMPUTER_USE_BROWSER_EXECUTABLE`.

If both are set, `COMPUTER_USE_BROWSER_EXECUTABLE` takes precedence.

## Core workflow (agent loop)

1. Capture a screenshot and send the user goal + screenshot to the model.
2. Parse `function_call` actions in the response.
3. Execute each action in Playwright.
4. If a `safety_decision` is `require_confirmation`, prompt the user before executing.
5. Send `function_response` objects containing the latest URL + screenshot.
6. Repeat until the model returns only text (no actions) or you hit the turn limit.

## Operational guidance

- Run in a sandboxed browser profile or container.
- Use `--exclude` to block risky actions you do not want the model to take.
- Keep the viewport at 1440x900 unless you have a reason to change it.

## Resources

- Script: `scripts/computer_use_agent.py`
- Reference notes: `references/google-computer-use.md`
- Env template: `env.example`