PDF OCR Using Gemini LLM — 技能 — openclaw中文资讯站

技能详情（站内镜像，无评论）

Extract text from PDFs using Google Gemini OCR. Use when extracting text from PDFs, performing OCR on scanned documents, or processing image-based PDFs.

媒体与内容

作者：Issam El Alaoui @ashtonizmev

许可证：MIT-0

MIT-0 ·免费使用、修改和重新分发。无需归因。

版本：v0.1.7

统计：⭐ 0 · 180 · 1 current installs · 1 all-time installs

⭐ 0

安装量（当前） 1

🛡 VirusTotal ：良性 · OpenClaw ：良性

Package：ashtonizmev/geminipdfocr

安全扫描（ClawHub）

VirusTotal ：良性
OpenClaw ：良性

OpenClaw 评估

The skill's code, install requirements, and runtime instructions match its stated purpose (OCR via Google Gemini) and request only the single expected credential (GOOGLE_API_KEY).

目的

Name/description, required env (GOOGLE_API_KEY), listed Python packages (google-genai, pymupdf), CLI entry point, and code all align with a PDF OCR tool that uploads pages to Google's Gemini API.

说明范围

The SKILL.md and code explicitly split PDFs into single-page files and upload full page files to Google's API for OCR. This behaviour is documented in the README and implemented in gemini_client.py (files.upload + models.generate_content). There are no apparent instructions or code that read unrelated files, other env vars, or send data to unknown endpoints, but note that entire page images are transmitted to Google (privacy/cost implication).

安装机制

Dependencies are standard Python packages (google-genai, pymupdf, pydantic, pydantic-settings) and a requirements.txt is included. No downloads from custom URLs or extracts from arbitrary hosts are present.

证书

Only GOOGLE_API_KEY is required and declared as the primary credential. That single key is appropriate and required for the Google Gemini client used by the skill. No unrelated secrets or config paths are requested.

持久

The skill is not always-enabled, does not modify other skills, and only writes temporary files under the system temp directory (cleans up after processing). It does not request elevated system persistence.

综合结论

This skill appears to be what it says: it splits PDFs into single-page files and uploads them to Google Gemini for OCR, and it requires only GOOGLE_API_KEY. Before installing, consider: (1) privacy — full page images are sent to Google, so do not use with highly sensitive documents unless acceptable; (2) cost and quotas — large PDFs mean many uploads and API usage billed against your API key; (3) secure the GOOGLE_API_KEY (don’t paste it into …

安装（复制给龙虾 AI）

将下方整段复制到龙虾中文库对话中，由龙虾按 SKILL.md 完成安装。

请把本段交给龙虾中文库（龙虾 AI）执行：为本机安装 OpenClaw 技能「PDF OCR Using Gemini LLM」。简介：Extract text from PDFs using Google Gemini OCR. Use when extracting text from P…。
请 fetch 以下地址读取 SKILL.md 并按文档完成安装：https://raw.githubusercontent.com/openclaw/skills/refs/heads/main/skills/ashtonizmev/geminipdfocr/SKILL.md
（来源：yingzhi8.cn 技能库）

SKILL.md

打开原始 SKILL.md（GitHub raw）

---
name: PDF OCR using Gemini LLM
description: Extract text from PDFs using Google Gemini OCR. Use when extracting text from PDFs, performing OCR on scanned documents, or processing image-based PDFs.
metadata:
  openclaw:
    requires:
      env:
        - GOOGLE_API_KEY
    primaryEnv: GOOGLE_API_KEY
    install:
      - kind: uv
        package: google-genai
        label: "Python deps"
      - kind: uv
        package: pymupdf
      - kind: uv
        package: pydantic
      - kind: uv
        package: pydantic-settings
---

## Purpose

Use geminipdfocr to extract text from PDF documents via OCR (Google Gemini).

## Data and privacy

**Full page images/files are sent to Google's API.** PDFs are split into single-page files and each page is uploaded to Google Gemini for OCR. There are no hidden exfiltration endpoints or other data collection. Do not use with highly sensitive documents unless you accept that content is sent to Google.

## Setup (venv installation)

Before first use, create and activate the virtual environment:

```bash
cd geminipdfocr && python -m venv venv && source venv/bin/activate && pip install -r requirements.txt
```

Set `GOOGLE_API_KEY` in your environment before running (e.g. `export GOOGLE_API_KEY=your-key`).

## How to use

When requested to extract text or perform OCR on a PDF:

1. Run: `cd geminipdfocr && source venv/bin/activate && python -m geminipdfocr <path-to-pdf> [--json] [--output <file>]`
2. Use `--json` for structured data.
3. Use `--max-pages N` for testing or very long documents.
4. Use `--quiet` to suppress progress logs.

## Requirements

- A valid PDF file path.
- `GOOGLE_API_KEY` set in the process environment (e.g. `export GOOGLE_API_KEY=your-key`).

## CLI options

| Option | Description |
|--------|-------------|
| `pdf_path` | One or more PDF file paths (positional) |
| `--max-pages N` | Limit pages per PDF |
| `--json` | Output structured JSON instead of plain text |
| `--output FILE` | Write result to file (default: stdout) |
| `--quiet` | Suppress INFO/DEBUG logs |