Scraper — 技能 — openclaw中文资讯站

技能详情（站内镜像，无评论）

Structured extraction and cleanup for public, user-authorized web pages. Use when the user wants to collect, clean, summarize, or transform content from acce...

数据与表格

许可证：MIT-0

MIT-0 ·免费使用、修改和重新分发。无需归因。

版本：v1.0.0

统计：⭐ 0 · 277 · 6 current installs · 6 all-time installs

⭐ 0

安装量（当前） 6

🛡 VirusTotal ：良性 · OpenClaw ：良性

Package：agistack/scraper

安全扫描（ClawHub）

VirusTotal ：良性
OpenClaw ：良性

OpenClaw 评估

The skill's code, instructions, and resource access are consistent with a simple local web-page scraping helper for public/user-authorized pages.

目的

Name/description match the included scripts: fetching pages, extracting text, saving outputs locally. No unrelated credentials, binaries, or installs are requested.

说明范围

SKILL.md and scripts restrict work to public/user-authorized pages and local-only storage. However, there is no runtime enforcement of those rules: the scripts will fetch any URL provided (including internal IPs/localhost), and there is no robots/paywall/captcha checking, rate limiting, or URL validation. That is expected for a small helper but is a security consideration rather than an incoherence.

安装机制

No install spec and no remote downloads; the skill is instruction-only with bundled Python scripts, which minimizes install risk.

证书

The skill requires no environment variables or credentials and only writes under ~/.openclaw/workspace/memory/scraper, consistent with the declared purpose.

持久

The skill is not always-enabled and can be invoked by the user. It does create persistent local state (jobs.json and output files) under the user's home — this is coherent but users should be aware of stored files and cleanup policy.

综合结论

This skill appears to do what it says: fetch public pages, extract text, and save results locally. Before installing or enabling it for autonomous use, consider: (1) the scripts will fetch any URL you or the agent give them — add URL validation or an allowlist if you need to block internal/IP ranges (SSRF risk); (2) there is no enforcement of 'public/user-authorized' rules — rely on agent policies or operator oversight to prevent misuse (paywa…

安装（复制给龙虾 AI）

将下方整段复制到龙虾中文库对话中，由龙虾按 SKILL.md 完成安装。

请把本段交给龙虾中文库（龙虾 AI）执行：为本机安装 OpenClaw 技能「Scraper」。简介：Structured extraction and cleanup for public, user-authorized web pages. Use wh…。
请 fetch 以下地址读取 SKILL.md 并按文档完成安装：https://raw.githubusercontent.com/openclaw/skills/refs/heads/main/skills/agistack/scraper/SKILL.md
（来源：yingzhi8.cn 技能库）

SKILL.md

打开原始 SKILL.md（GitHub raw）

---
name: scraper
description: Structured extraction and cleanup for public, user-authorized web pages. Use when the user wants to collect, clean, summarize, or transform content from accessible pages into reusable text or data. Do not use to bypass logins, paywalls, captchas, robots restrictions, or access controls. Local-only output.
---

# Scraper

Turn messy public pages into clean, reusable data.

## Core Purpose
Scraper is a safe extraction skill for public, user-authorized pages.
It helps the agent:
- fetch page content from a URL
- extract readable text
- strip boilerplate where possible
- save clean output locally
- prepare content for later summarization or analysis

## Safety Boundaries
- Only use on public or user-authorized pages
- Do not bypass logins, paywalls, captchas, robots restrictions, or rate limits
- Do not request or store credentials
- Do not perform stealth scraping, account creation, or identity evasion
- Save outputs locally only

## Runtime Requirements
- Python 3 must be available as `python3`
- No external packages required

## Local Storage
All outputs are stored locally under:
- `~/.openclaw/workspace/memory/scraper/jobs.json`
- `~/.openclaw/workspace/memory/scraper/output/`

## Key Workflows
- **Capture a page**: `fetch_page.py --url "https://example.com"`
- **Extract readable text**: `extract_text.py --url "https://example.com"`
- **Save cleaned content**: `save_output.py --url "https://example.com" --title "Example"`
- **List prior jobs**: `list_jobs.py`

## Scripts
| Script | Purpose |
|---|---|
| `init_storage.py` | Initialize scraper storage |
| `fetch_page.py` | Download a page with standard headers |
| `extract_text.py` | Convert HTML into cleaned plain text |
| `save_output.py` | Save extracted output and register a job |
| `list_jobs.py` | Show past scraping jobs |