Personal Knowledge Base Lite — 技能 — openclaw中文资讯站

技能详情（站内镜像，无评论）

Local file knowledge base with LLM-powered indexing and Q&A. Scans a directory for documents (.txt, .md, .pdf, .docx, .xlsx, .csv), extracts metadata (summar...

媒体与内容

许可证：MIT-0

MIT-0 ·免费使用、修改和重新分发。无需归因。

版本：v1.0.0

统计：⭐ 1 · 53 · 0 current installs · 0 all-time installs

⭐ 1

安装量（当前） 0

🛡 VirusTotal ：良性 · OpenClaw ：良性

Package：congshengwu/personal-kb-lite

安全扫描（ClawHub）

VirusTotal ：良性
OpenClaw ：良性

OpenClaw 评估

The skill's behavior and requirements are consistent with a local file-based knowledge‑base: the requested packages match the file types it indexes and the runtime instructions confine activity to a user-specified watch directory, but a small ambiguity in the install spec and the usual privacy risks of indexing arbitrary folders warrant caution.

目的

The name/description (local KB, indexing PDFs/DOCX/XLSX/CSV/md/txt) aligns with required binaries (python3) and the listed Python packages (pdfplumber, python-docx, openpyxl) which are appropriate for parsing those formats. No unrelated credentials, binaries, or config paths are requested.

说明范围

Instructions explicitly read/write .kb-config.json and .kb-meta.json and scan a user-provided WATCH_DIR using find/stat/wc; this is appropriate for indexing. However, because the tool asks the user for an absolute path and will operate on any path provided, the user must avoid pointing it at sensitive system or home directories. There are no instructions to transmit data externally, but the agent’s Read/Write tools and LLM summarization will a…

安装机制

The install entries list pdfplumber, python-docx, and openpyxl — reasonable for the stated task and typically obtained from PyPI. The install kind is labeled 'uv', which is nonstandard/ambiguous in this manifest (not a well-known installer name like 'pip' or a direct GitHub release); clarify the install mechanism before installing so you know what will be fetched and executed.

证书

The skill requests no environment variables, credentials, or external tokens. It only reads/writes local config and metadata files in the working directory and WATCH_DIR, which is proportionate to a local KB.

持久

The skill is not marked always:true and does not request system-wide changes or other skills' configurations. It writes its own .kb-config.json/.kb-meta.json only. Autonomous invocation is allowed by default (normal for skills) but not otherwise elevated.

综合结论

This skill appears to do what it says: build a local index and answer questions from files. Before installing: - Do not point WATCH_DIR to system folders, your home directory, or any location containing secrets; create a dedicated directory with only the documents you want indexed. - Confirm what the installer 'uv' actually does (ask the publisher or registry) and prefer installing known packages via pip from PyPI or in a virtual environment. …

安装（复制给龙虾 AI）

将下方整段复制到龙虾中文库对话中，由龙虾按 SKILL.md 完成安装。

请把本段交给龙虾中文库（龙虾 AI）执行：为本机安装 OpenClaw 技能「Personal Knowledge Base Lite」。简介：Local file knowledge base with LLM-powered indexing and Q&A. Scans a directory …。
请 fetch 以下地址读取 SKILL.md 并按文档完成安装：https://raw.githubusercontent.com/openclaw/skills/refs/heads/main/skills/congshengwu/personal-kb-lite/SKILL.md
（来源：yingzhi8.cn 技能库）

SKILL.md

打开原始 SKILL.md（GitHub raw）

---
name: personal-kb-lite
description: >
  Local file knowledge base with LLM-powered indexing and Q&A.
  Scans a directory for documents (.txt, .md, .pdf, .docx, .xlsx, .csv),
  extracts metadata (summary, tags, table of contents) using the LLM,
  and answers questions by retrieving relevant files.
version: 1.0.0
metadata:
  openclaw:
    requires:
      bins:
        - python3
    install:
      - kind: uv
        package: pdfplumber
      - kind: uv
        package: python-docx
      - kind: uv
        package: openpyxl
    keywords:
      - knowledge base
      - document search
      - file indexing
      - RAG
      - Q&A
      - local search
    categories:
      - productivity
      - knowledge-management
    command: kb
    emoji: "📚"
---

## Task: Local Knowledge Base

Arguments: **$ARGUMENTS**

### Mode Selection

- If `$ARGUMENTS` is `--alias <name>`: set a custom command alias — save `{"command_alias": "<name>"}` into `.kb-config.json` (merge with existing fields), then reply: `Command alias set to /<name>`. Terminate.
- If `$ARGUMENTS` is `--alias reset`: remove the `command_alias` field from `.kb-config.json`, then reply: `Command alias reset to default /kb`. Terminate.
- If `$ARGUMENTS` is empty (user only typed `/kb`), execute the **Indexing Workflow** below
- If `$ARGUMENTS` is not empty, treat it as a user question and execute the **Q&A Workflow** below

---

## Indexing Workflow

### Step 1: Confirm Watch Directory

1. Read `.kb-config.json` from the current working directory using the Read tool
2. If the file exists and `watch_dir` is a non-empty string, use it as `WATCH_DIR`
3. If the file does not exist or `watch_dir` is empty, **ask the user**:
   > Please enter the absolute path of the directory to watch:

   Save the user's input as `WATCH_DIR`, and write the following to `.kb-config.json` using the Write tool:
   ```json
   {"watch_dir": "<WATCH_DIR>"}
   ```

4. Verify the directory exists using Bash:
   ```bash
   ls "<WATCH_DIR>" > /dev/null 2>&1 && echo "ok" || echo "not_found"
   ```
   If it returns `not_found`, inform the user and terminate.

### Step 2: Load Existing Metadata

Read `<WATCH_DIR>/.kb-meta.json` using the Read tool.

- If the file exists, parse the JSON and extract the `files` object (keyed by filename)
- If it does not exist, initialize `files = {}`

### Step 3: Scan Directory for New and Modified Files

List files recursively in the directory using Bash:

```bash
find "<WATCH_DIR>" -type f ( -name "*.txt" -o -name "*.md" -o -name "*.docx" -o -name "*.pdf" -o -name "*.xlsx" -o -name "*.csv" ) -not -name ".kb-meta.json" -print
```

For each file found, get its modification time using Bash:

```bash
stat -f "%m" "<filepath>"   # macOS
stat -c "%Y" "<filepath>"   # Linux
```

A file needs (re-)indexing if:
- Its relative path is not in the `files` object, OR
- Its modification time is newer than the `indexed_at` timestamp in the existing entry

Also, remove any entries from `files` whose files **no longer exist on disk** (stale cleanup).

Use the relative path from `WATCH_DIR` as the key (e.g., `subdir/report.pdf`).

If there are no files to process and no stale entries to clean, output:
```
No changes detected (X files already indexed)
```
Then terminate.

### Step 4: Process Each File That Needs Indexing

For each file, follow these steps:

#### 4a. Read File Content

Read the file using the method described in **Appendix: File Reading Methods** below, with `MAX_CHARS = 8000`.

If reading fails, record `"error": "Read failed: <error message>"` in the file's metadata entry and skip to the next file.

#### 4b. Extract Metadata

After reading the file content, generate the following fields:

- **`summary`**: A 100-200 word summary describing the file's core content and purpose
- **`tags`**: 3-8 classification tags as a JSON array
- **`toc`**: A list of the file's major section/chapter titles as a JSON array (max 20 items); return an empty array if no clear structure exists

#### 4c. Update and Save Metadata Immediately

Add the following entry to the `files` object:

```json
"<filename>": {
  "filename": "<filename>",
  "summary": "<summary>",
  "tags": ["tag1", "tag2"],
  "toc": ["section1", "section2"],
  "indexed_at": "<current time, ISO 8601 format>",
  "file_size": <bytes, obtained via Bash wc -c>
}
```

**Write the complete `.kb-meta.json` back to disk immediately after processing each file**, to avoid losing progress if interrupted.

Show progress for each file, e.g.:
```
[1/3] Processing: contract_template.txt ... done
[2/3] Processing: product_manual.pdf ... done
[3/3] Processing: quotation.xlsx ... failed (read error)
```

### Step 5: Output Summary

```
Indexing complete
  Total files scanned: X
  Newly indexed: Y
  Failed/skipped: Z
  Metadata file: <WATCH_DIR>/.kb-meta.json
```

---

## Q&A Workflow

User question: **$ARGUMENTS**

### Step 1: Load Configuration and Metadata

1. Read `.kb-config.json` from the current working directory using the Read tool, and get the `watch_dir` field (referred to as `WATCH_DIR`)

   If the file does not exist or `watch_dir` is empty, reply:
   > Knowledge base directory not configured. Please run `/kb` first to set up the index.

   Then terminate.

2. Read `<WATCH_DIR>/.kb-meta.json` using the Read tool

   If the file does not exist or `files` is an empty object, reply:
   > Knowledge base index is empty. Please run `/kb` first to build the index.

   Then terminate.

### Step 2: Build File Index Summary

Organize each entry in `files` into a single line of plain text:

```
File: contract_template.txt, Summary: Standard procurement contract template with rights and obligations..., Tags: contract, procurement, template, TOC: Article 1 Definitions, Article 2 Rights, Article 3 Breach
File: product_manual.pdf, Summary: Technical specifications and installation guide for Product A..., Tags: product, specs, TOC: Overview, Installation, FAQ
```

- If a field is empty or missing, omit it
- `toc`: show at most the first 15 items, comma-separated
- `tags`: comma-separated

### Step 3: Identify Relevant Files

Based on the index summary and user question, determine which files **may** contain the answer.

Criteria:
- The file's summary, tags, or table of contents contain information semantically related to the question
- Prefer including a marginally relevant file over missing a potentially useful one

**If no files are possibly relevant**, reply directly:
> Sorry, no files related to your question were found in the knowledge base.
>
> Current indexed file count: X
> Suggestion: Verify the question falls within the knowledge base scope, or run `/kb` to update the index.

Then terminate.

Record the matched filenames as an internal variable `MATCHED_FILES` (do not show this list to the user).

### Step 4: Read Full Content of Relevant Files

For each filename in `MATCHED_FILES`, read the file using the method described in **Appendix: File Reading Methods** below (no truncation, omit `MAX_CHARS`).

If a file fails to read, skip it and continue with the remaining files.

### Step 5: Generate Answer

Based on the content of the retrieved files, answer the user's question.

Requirements:

1. **Answer directly**, preferring to quote the file's original text, data, or specific statements
2. If multiple files contain relevant information, synthesize the answer and note which file each part comes from
3. If the file content can only partially answer the question, indicate which parts could not be found in the knowledge base
4. Append **citation sources** at the end of the answer, formatted as:

```
---
Sources:
[1] contract_template.txt
[2] product_manual.pdf
```

Only list files actually used in the answer; do not include matched but unused files.

---

## Appendix: File Reading Methods

Use the following methods based on file extension. All Bash commands pass the file path via the `KB_FILE` environment variable to avoid filename injection issues.

Optional parameter `MAX_CHARS`: if provided, truncate output to that many characters; if omitted, output the full content.

- **`.txt` / `.md` / `.csv`**: Read using the Read tool at `<WATCH_DIR>/<filename>`. If `MAX_CHARS` is set, only read the first `MAX_CHARS` characters (approximately `MAX_CHARS / 40` lines).

- **`.docx`**: Run via Bash:
  ```bash
  KB_FILE="<WATCH_DIR>/<filename>" python3 -c "
  import docx, os
  doc = docx.Document(os.environ['KB_FILE'])
  text = 'n'.join(p.text for p in doc.paragraphs if p.text.strip())
  limit = int(os.environ.get('KB_MAX_CHARS', '0'))
  print(text[:limit] if limit else text)
  "
  ```

- **`.pdf`**: Run via Bash:
  ```bash
  KB_FILE="<WATCH_DIR>/<filename>" python3 -c "
  import pdfplumber, os
  with pdfplumber.open(os.environ['KB_FILE']) as pdf:
      text = 'n'.join(p.extract_text() or '' for p in pdf.pages)
  limit = int(os.environ.get('KB_MAX_CHARS', '0'))
  print(text[:limit] if limit else text)
  "
  ```

- **`.xlsx`**: Run via Bash:
  ```bash
  KB_FILE="<WATCH_DIR>/<filename>" python3 -c "
  import openpyxl, os
  wb = openpyxl.load_workbook(os.environ['KB_FILE'], data_only=True)
  rows = []
  for ws in wb.worksheets:
      for row in ws.iter_rows(values_only=True):
          line = 't'.join(str(v) for v in row if v is not None)
          if line.strip():
              rows.append(line)
  text = 'n'.join(rows)
  limit = int(os.environ.get('KB_MAX_CHARS', '0'))
  print(text[:limit] if limit else text)
  "
  ```

When `MAX_CHARS` is needed, prepend `KB_MAX_CHARS="<value>"` to the environment variables, e.g.:
```bash
KB_FILE="..." KB_MAX_CHARS="8000" python3 -c "..."
```