Token Guard — 技能 — openclaw中文资讯站

技能详情（站内镜像，无评论）

Prevents LLM API 429 errors by estimating tokens, tracking quotas, throttling requests, detecting duplicates, caching responses, and auto-fallback by model.

开发与 DevOps

许可证：MIT-0

MIT-0 ·免费使用、修改和重新分发。无需归因。

版本：v1.5.0

统计：⭐ 0 · 704 · 3 current installs · 4 all-time installs

⭐ 0

安装量（当前） 4

🛡 VirusTotal ：良性 · OpenClaw ：可疑

Package：token-guard

安全扫描（ClawHub）

VirusTotal ：良性
OpenClaw ：可疑

OpenClaw 评估

The package generally implements a simple quota checker that can help avoid 429s, but the SKILL.md claims many features (caching, duplicate detection, record_usage/record_429 APIs, etc.) that are not present in the code — the docs and runtime behavior are inconsistent.

目的

Name/description imply a token/429 prevention engine and the included TokenGuard class does implement basic TPM/RPM checks and atomic state writes, which aligns with the stated purpose. However SKILL.md advertises multiple features (duplicate detection, response caching, 429 parser, record_usage/cache_response/record_429 methods, auto model fallback chains, etc.) that are not implemented in scripts/token_guard.py. That mismatch means the skill…

说明范围

SKILL.md usage examples instruct callers to call guard.record_usage(...), guard.cache_response(...), guard.record_429(...), and other methods, but the code only exposes TokenGuard.check_quota(...) and no record/cache methods. The instructions therefore direct an agent/developer to call non-existent APIs, which will cause runtime errors or undefined behavior. The README also claims duplicate detection and caching, but the code does not store pr…

安装机制

No install spec is provided (instruction-only skill with a single script). No external downloads or package installs are required, which minimizes install-time risk.

证书

The skill requests no environment variables or credentials and the code does not read environment variables, secrets, or network endpoints. It does write a local state file but does not log prompt contents or responses, so credential or prompt exfiltration is not apparent.

持久

TokenGuard writes a state.json file by default into a directory computed relative to the script (base_dir = two directories above the script). That creates persistent state on disk (usage counts, request counts, window_start). This is expected for quota tracking but you should note where files will be written and whether that location is writable or appropriate. always:false and no special privileges requested.

安装（复制给龙虾 AI）

将下方整段复制到龙虾中文库对话中，由龙虾按 SKILL.md 完成安装。

请把本段交给龙虾中文库（龙虾 AI）执行：为本机安装 OpenClaw 技能「Token Guard」。简介：Prevents LLM API 429 errors by estimating tokens, tracking quotas, throttling r…。
请 fetch 以下地址读取 SKILL.md 并按文档完成安装：https://raw.githubusercontent.com/openclaw/skills/refs/heads/main/skills/edmonddantesj/token-guard/SKILL.md
（来源：yingzhi8.cn 技能库）

SKILL.md

打开原始 SKILL.md（GitHub raw）

# TokenGuard — LLM API 429 Prevention Engine

<!-- 🌌 Aoineco-Verified | S-DNA: AOI-2026-0213-SDNA-TG01 -->

**Version:** 1.5.0  
**Author:** Aoineco & Co.  
**License:** MIT  
**Tags:** rate-limit, 429, token-management, cost-optimization, llm-guard, high-performance

## Description

Prevents LLM API 429 (Rate Limit / Resource Exhausted) errors by intercepting requests before they're sent. Designed for users on free/low-cost API plans who need maximum intelligence per dollar.

**Core philosophy:** *"Intelligence is measured not by how much you spend, but by how little you need."*

## Problem

When using LLM APIs (especially Google Gemini Flash with 1M TPM limit):
- Large documents (docx, PDFs) can consume the entire minute quota in one request
- Failed requests still count toward token usage
- Retry loops after 429 errors waste more tokens → death spiral
- No built-in way to detect runaway/duplicate requests

## Features

| Feature | Description |
|---------|-------------|
| **Pre-flight Token Estimation** | Estimates token count before API call (CJK-aware, no tiktoken dependency) |
| **Real-time Quota Tracking** | Tracks per-model per-minute token usage with sliding window |
| **Smart Throttle** | Auto-waits when quota > 80%, blocks at > 95% |
| **Duplicate Detection** | Blocks identical requests within 60s window (3+ = runaway) |
| **Response Caching** | Caches successful responses for duplicate requests |
| **Auto Model Fallback** | Switches to cheaper/available model when primary is exhausted |
| **429 Error Parser** | Extracts exact retry delay from Google/Anthropic error responses |
| **Batch vs Mistake Detection** | Distinguishes intentional bulk processing from error loops |

## Supported Models

Pre-configured quotas for:
- `gemini-3-flash` (1M TPM)
- `gemini-3-pro` (2M TPM)
- `claude-haiku` (50K TPM)
- `claude-sonnet` (200K TPM)
- `claude-opus` (200K TPM)
- `gpt-4o` (800K TPM)
- `deepseek` (1M TPM)

Custom quotas can be added for any model.

## Usage

```python
from token_guard import TokenGuard

guard = TokenGuard()

# Before every API call:
decision = guard.check(prompt_text, model="gemini-3-flash")

if decision.action == "proceed":
    response = call_your_api(prompt_text)
    guard.record_usage(decision.estimated_tokens, model="gemini-3-flash")
    guard.cache_response(prompt_text, response)

elif decision.action == "wait":
    time.sleep(decision.wait_seconds)
    # retry

elif decision.action == "fallback":
    response = call_your_api(prompt_text, model=decision.fallback_model)

elif decision.action == "block":
    print(f"Blocked: {decision.reason}")

# If you get a 429 error:
guard.record_429("gemini-3-flash", retry_delay=53.0)
```

## Integration with OpenClaw

Add to your agent's config or use as a middleware:

```yaml
skills:
  - token-guard
```

The agent can invoke TokenGuard before any LLM API call to prevent quota exhaustion.

## File Structure

```
token-guard/
├── SKILL.md          # This file
└── scripts/
    └── token_guard.py  # Main engine (zero external dependencies)
```

## Status Output Example

```json
{
  "models": {
    "gemini-3-flash": {
      "tpm_limit": 1000000,
      "used_this_minute": 750000,
      "remaining": 250000,
      "usage_pct": "75.0%",
      "status": "🟢 OK"
    }
  },
  "stats": {
    "total_checks": 42,
    "tokens_saved": 128000,
    "blocks": 3,
    "fallbacks": 2
  }
}
```

## Zero Dependencies

Pure Python 3.10+. No pip install needed. No tiktoken, no external API calls.
Designed for the $7 Bootstrap Protocol — every byte counts.