openclaw 网盘下载
OpenClaw

技能详情(站内镜像,无评论)

首页 > 技能库 > Web Scraper - Firecrawl

Web scraping and content extraction using Firecrawl API. Use when users need to crawl websites, extract structured data, convert web pages to markdown, scrap...

媒体与内容

作者:antonia huang @antonia-sz

许可证:MIT-0

MIT-0 ·免费使用、修改和重新分发。无需归因。

版本:v1.0.0

统计:⭐ 0 · 24 · 0 current installs · 0 all-time installs

0

安装量(当前) 0

🛡 VirusTotal:Pending · OpenClaw :可疑

Package:antonia-sz/web-scraper-firecrawl

安全扫描(ClawHub)

  • VirusTotal:Pending
  • OpenClaw :可疑

OpenClaw 评估

The skill largely does what it says (wraps the Firecrawl API), but there are multiple inconsistencies between the declared metadata, the runtime instructions, and the included script that reduce trustworthiness and should be fixed before installation.

目的

The skill's name, description, SKILL.md and the included script all describe a Firecrawl API client (scrape, crawl, map, batch, extract) which is coherent with the declared purpose. However the registry metadata lists no required environment variables or primary credential even though both SKILL.md and the script require a FIRECRAWL_API_KEY to operate.

说明范围

SKILL.md instructs users to set FIRECRAWL_API_KEY and to install the Python 'requests' dependency, but the included script reads FIRECRAWL_API_KEY from the environment and uses urllib (not requests). The instructions expect an external API key and allow reading schema and URL list files — which is expected — but the mismatch between docs and code and the presence of an apparent truncation/typo near the end of the script (an isolated 's' and tr…

安装机制

No install spec is provided (instruction-only installation) and the code file is included in the skill bundle. No remote downloads or archive extraction are used, which minimizes install-time risk.

证书

Only FIRECRAWL_API_KEY is used by the script (reasonable for a third-party scraping API), but the skill metadata did not declare any required env vars or a primary credential. The omission is a mismatch that could confuse users and cause them to unknowingly supply a secret without expecting to. No other unrelated credentials are requested.

持久

The skill is not marked always:true and does not request elevated or persistent system-wide privileges. It does write output files when asked and reads user-provided files (schemas, URL lists), which is expected behavior.

安装(复制给龙虾 AI)

将下方整段复制到龙虾中文库对话中,由龙虾按 SKILL.md 完成安装。

请把本段交给龙虾中文库(龙虾 AI)执行:为本机安装 OpenClaw 技能「Web Scraper - Firecrawl」。简介:Web scraping and content extraction using Firecrawl API. Use when users need to…。
请 fetch 以下地址读取 SKILL.md 并按文档完成安装:https://raw.githubusercontent.com/openclaw/skills/refs/heads/main/skills/antonia-sz/web-scraper-firecrawl/SKILL.md
(来源:yingzhi8.cn 技能库)

SKILL.md

打开原始 SKILL.md(GitHub raw)

---
name: firecrawl
description: Web scraping and content extraction using Firecrawl API. Use when users need to crawl websites, extract structured data, convert web pages to markdown, scrape multiple URLs, or build knowledge bases from web content. Supports single page extraction, site-wide crawling, batch processing, and structured data extraction with CSS selectors.
---

# Firecrawl Skill

Powerful web scraping powered by [Firecrawl](https://github.com/mendableai/firecrawl) - turn websites into LLM-ready markdown.

## Overview

Firecrawl provides APIs for:
- **Scrape** - Single page extraction to markdown
- **Crawl** - Entire site crawling with depth control
- **Map** - URL discovery from a starting point
- **Batch** - Multiple URL processing
- **Extract** - Structured data extraction with schemas

## Prerequisites

1. **Firecrawl API Key** - Get free tier at https://firecrawl.dev
2. Install Python dependencies: `requests`

## Configuration

Set environment variable:
```bash
export FIRECRAWL_API_KEY="fc-your-api-key"
```

## Usage

### Single Page Scraping
```bash
# Basic scrape
firecrawl scrape https://example.com

# With specific options
firecrawl scrape https://example.com --formats markdown,html --only-main-content

# Wait for JS rendering
firecrawl scrape https://spa-app.com --wait-for 2000
```

### Site Crawling
```bash
# Crawl entire site (up to limit)
firecrawl crawl https://docs.example.com --limit 50

# With depth control
firecrawl crawl https://blog.example.com --max-depth 2 --limit 100

# Include/exclude patterns
firecrawl crawl https://site.com --include "/blog/*" --exclude "/admin/*"

# Custom formats
firecrawl crawl https://docs.example.com --formats markdown,links
```

### URL Mapping
```bash
# Discover all URLs from a site
firecrawl map https://example.com

# With search term
firecrawl map https://docs.python.org --search "tutorial"
```

### Batch Processing
```bash
# Scrape multiple URLs
firecrawl batch urls.txt --output ./scraped/

# From JSON list
firecrawl batch urls.json --formats markdown --concurrency 5
```

### Structured Extraction
```bash
# Extract specific data using CSS selectors
firecrawl extract https://example.com/products 
  --schema '{"name": ".product-title", "price": ".price", "description": ".desc"}'

# Extract to JSON
firecrawl extract https://news.example.com/article --schema article-schema.json
```

## Output Formats

### Markdown
Clean, LLM-ready markdown with:
- Headings preserved
- Links converted to markdown format
- Images with alt text
- Tables formatted as markdown tables

### HTML
Raw or cleaned HTML

### Links
Extracted link lists for further crawling

### Screenshot
Page screenshot (if requested)

## Use Cases

### Knowledge Base Building
```bash
# Crawl documentation site
firecrawl crawl https://docs.framework.com --limit 200 -o ./kb/

# Merge into single file for RAG
cat ./kb/*.md > knowledge-base.md
```

### Research & Analysis
```bash
# Scrape competitor pricing
firecrawl batch competitors.txt --extract pricing-schema.json

# Monitor blog updates
firecrawl map https://blog.company.com --since 2024-01-01
```

### Content Migration
```bash
# Export old CMS content
firecrawl crawl https://old-site.com --formats markdown,html -o ./export/
```

## Scripts

All functionality via `scripts/firecrawl.py`:
- Handles API authentication
- Automatic rate limiting
- Retry logic for failures
- Progress tracking for large crawls

## Integration

Works well with:
- `markdown-sync-pro` - Sync scraped content to Notion/GitHub
- `arxiv-paper` - Combine with academic paper downloads
- `maybe-finance` - Scrape financial data for analysis