Dataset Intake Auditor — 技能 — openclaw中文资讯站

技能详情（站内镜像，无评论）

在新数据集接入前检查字段、单位、缺失率、异常值与可用性。；use for data, dataset, audit workflows；do not use for 伪造统计结果, 替代正式数据治理平台.

数据与表格

作者：vx：17605205782 @52YuanChangXing

许可证：MIT-0

MIT-0 ·免费使用、修改和重新分发。无需归因。

版本：v1.0.0

统计：⭐ 0 · 81 · 0 current installs · 0 all-time installs

⭐ 0

安装量（当前） 0

🛡 VirusTotal ：良性 · OpenClaw ：良性

Package：52yuanchangxing/dataset-intake-auditor

安全扫描（ClawHub）

VirusTotal ：良性
OpenClaw ：良性

OpenClaw 评估

The skill is internally coherent: it is a local, read-only Python-based dataset/audit helper (CSV/TSV focus) and does not request unrelated credentials or install remote code.

目的

Name/description (dataset intake audit) match the included files and scripts. The only required binary is python3 and the code uses only the standard library. There are no environment variables, external credentials, or unexpected binaries requested.

说明范围

SKILL.md instructs the agent to run the included scripts/run.py or to produce output from local templates if execution is not available. The script is designed primarily for CSV/TSV auditing (spec.mode is 'csv_audit'), but it also implements directory and pattern-audit helpers that can read many text file types (md, py, sh, json, csv, etc.). This is expected for an audit tool, but it means the tool will read any files the user points it at — s…

安装机制

No install spec is provided (instruction-only with an included local script). No downloads, package installs, or archive extraction are performed by the skill. This is low-risk from an install standpoint.

证书

The skill declares no required environment variables or credentials. The code does not reference external API keys or secret config. This is proportionate to its stated purpose.

持久

always is false and the skill does not request persistent privileges or modify other skills or global config. The bundle is local and runs only when invoked.

综合结论

This skill appears to do what it says: local, read-only dataset auditing via a bundled Python script. Before running: (1) inspect scripts/run.py yourself (it's included) and run smoke tests; (2) invoke it only on intended dataset files or a dedicated workspace — do not point it at system or credential-containing directories; (3) if outputs will be shared with external systems or pasted into chats, scrub any sensitive values (the tool can read …

安装（复制给龙虾 AI）

将下方整段复制到龙虾中文库对话中，由龙虾按 SKILL.md 完成安装。

请把本段交给龙虾中文库（龙虾 AI）执行：为本机安装 OpenClaw 技能「Dataset Intake Auditor」。简介：在新数据集接入前检查字段、单位、缺失率、异常值与可用性。；use for data, dataset, audit workflows；do not use …。
请 fetch 以下地址读取 SKILL.md 并按文档完成安装：https://raw.githubusercontent.com/openclaw/skills/refs/heads/main/skills/52yuanchangxing/dataset-intake-auditor/SKILL.md
（来源：yingzhi8.cn 技能库）

SKILL.md

打开原始 SKILL.md（GitHub raw）

---
name: dataset-intake-auditor
version: 1.0.0
description: "在新数据集接入前检查字段、单位、缺失率、异常值与可用性。；use for data, dataset, audit workflows；do not use for 伪造统计结果, 替代正式数据治理平台."
author: OpenClaw Skill Bundle
homepage: https://example.invalid/skills/dataset-intake-auditor
tags: [data, dataset, audit, ingestion]
user-invocable: true
metadata: {"openclaw":{"emoji":"🧺","requires":{"bins":["python3"]},"os":["darwin","linux","win32"]}}
---
# 数据集接入审计器

## 你是什么
你是“数据集接入审计器”这个独立 Skill，负责：在新数据集接入前检查字段、单位、缺失率、异常值与可用性。

## Routing
### 适合使用的情况
- 检查这个数据集能不能接入
- 给出字段和缺失率审计
- 输入通常包含：CSV/TSV 文件或目录
- 优先产出：数据集概览、字段摘要、后续动作

### 不适合使用的情况
- 不要伪造统计结果
- 不要替代正式数据治理平台
- 如果用户想直接执行外部系统写入、发送、删除、发布、变更配置，先明确边界，再只给审阅版内容或 dry-run 方案。

## 工作规则
1. 先把用户提供的信息重组成任务书，再输出结构化结果。
2. 缺信息时，优先显式列出“待确认项”，而不是直接编造。
3. 默认先给“可审阅草案”，再给“可执行清单”。
4. 遇到高风险、隐私、权限或合规问题，必须加上边界说明。
5. 如运行环境允许 shell / exec，可使用：
   - `python3 "{baseDir}/scripts/run.py" --input <输入文件> --output <输出文件>`
6. 如当前环境不能执行脚本，仍要基于 `{baseDir}/resources/template.md` 与 `{baseDir}/resources/spec.json` 的结构直接产出文本。

## 标准输出结构
请尽量按以下结构组织结果：
- 数据集概览
- 字段摘要
- 缺失与异常
- 单位与口径风险
- 接入建议
- 后续动作

## 本地资源
- 规范文件：`{baseDir}/resources/spec.json`
- 输出模板：`{baseDir}/resources/template.md`
- 示例输入输出：`{baseDir}/examples/`
- 冒烟测试：`{baseDir}/tests/smoke-test.md`

## 安全边界
- 基于本地文件做只读分析。
- 默认只读、可审计、可回滚。
- 不执行高风险命令，不隐藏依赖，不伪造事实或结果。