Ml Model Eval Benchmark — 技能 — openclaw中文资讯站

技能详情（站内镜像，无评论）

Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.

开发与 DevOps

作者：Muhammad Mazhar Saeed @0x-professor

许可证：MIT-0

MIT-0 ·免费使用、修改和重新分发。无需归因。

版本：v0.1.0

统计：⭐ 0 · 278 · 3 current installs · 3 all-time installs

⭐ 0

安装量（当前） 3

🛡 VirusTotal ：良性 · OpenClaw ：良性

Package：0x-professor/ml-model-eval-benchmark

安全扫描（ClawHub）

VirusTotal ：良性
OpenClaw ：良性

OpenClaw 评估

The skill is internally consistent with its stated purpose: it contains an instruction doc and a small, local script that reads a JSON payload, computes weighted scores, and writes a leaderboard — no external network, credentials, or unusual installs are requested.

目的

Name and description match the included files: SKILL.md, a benchmarking guide, and a Python script that computes weighted scores and rankings. Nothing in the bundle requests unrelated capabilities or credentials.

说明范围

Runtime instructions instruct the agent to run the bundled script and consult the guide. The script only reads a user-supplied JSON input (size-limited), computes scores, and writes an output artifact. The instructions do not ask the agent to read other system files, environment variables, or transmit data externally.

安装机制

No install spec is provided (instruction-only with a bundled script). No downloads, package installs, or external package registry usage are present.

证书

The skill declares no environment variables, credentials, or config paths. The script operates solely on an explicit input file and an explicit output path; there are no hidden secret requirements.

持久

always is false and the skill does not request persistent system presence or modify other skills. The script writes only to the user-specified output path and creates parent directories as needed.

综合结论

This skill appears low-risk and does what it says: run the bundled script with a JSON input to produce a leaderboard. Before installing/using it: (1) review or run the script locally on non-sensitive sample data to confirm behavior; (2) ensure the input JSON and requested output path are trusted (the script will create parent directories and may overwrite the specified output file); (3) note there are no network calls or credential accesses, s…

安装（复制给龙虾 AI）

将下方整段复制到龙虾中文库对话中，由龙虾按 SKILL.md 完成安装。

请把本段交给龙虾中文库（龙虾 AI）执行：为本机安装 OpenClaw 技能「Ml Model Eval Benchmark」。简介：Compare model candidates using weighted metrics and deterministic ranking outpu…。
请 fetch 以下地址读取 SKILL.md 并按文档完成安装：https://raw.githubusercontent.com/openclaw/skills/refs/heads/main/skills/0x-professor/ml-model-eval-benchmark/SKILL.md
（来源：yingzhi8.cn 技能库）

SKILL.md

打开原始 SKILL.md（GitHub raw）

---
name: ml-model-eval-benchmark
description: Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.
---

# ML Model Eval Benchmark

## Overview

Produce consistent model ranking outputs from metric-weighted evaluation inputs.

## Workflow

1. Define metric weights and accepted metric ranges.
2. Ingest model metrics for each candidate.
3. Compute weighted score and ranking.
4. Export leaderboard and promotion recommendation.

## Use Bundled Resources

- Run `scripts/benchmark_models.py` to generate benchmark outputs.
- Read `references/benchmarking-guide.md` for weighting and tie-break guidance.

## Guardrails

- Keep metric names and scales consistent across candidates.
- Record weighting assumptions in the output.