AB Test Setup — 技能 — openclaw中文资讯站

技能详情（站内镜像，无评论）

Plan A/B tests with a clear hypothesis, defined metrics, variant design, sample size, duration, and statistical significance guidelines.

开发与 DevOps

许可证：MIT-0

MIT-0 ·免费使用、修改和重新分发。无需归因。

版本：v1.0.0

统计：⭐ 0 · 28 · 0 current installs · 0 all-time installs

⭐ 0

安装量（当前） 0

🛡 VirusTotal：Pending · OpenClaw ：良性

Package：amdf01-debug/sw-ab-test-setup

安全扫描（ClawHub）

VirusTotal：Pending
OpenClaw ：良性

OpenClaw 评估

This is an instruction-only skill that provides a coherent, self-contained checklist and output template for planning A/B tests and does not request extra permissions, credentials, or perform any installs.

目的

The name and description (A/B test planning) match the SKILL.md: it provides hypothesis, metrics, sample size, duration, design, and decision rules. The skill does not request unrelated binaries, env vars, or credentials.

说明范围

Runtime instructions are limited to planning steps, output formatting, and rules for running/deciding on tests. They do not instruct the agent to read local files, access environment variables, call external endpoints, or perform system operations.

安装机制

No install spec and no code files — the skill is instruction-only, so nothing is written to disk or installed during setup.

证书

The skill declares no environment variables, credentials, or config paths and the instructions do not reference any secrets or external service tokens.

持久

always is false (no forced inclusion). disable-model-invocation is default false (agent may call the skill autonomously), which is normal for skills and acceptable here since the skill has no sensitive access.

综合结论

This skill is an offline planning template for A/B tests and appears self-contained. You can install it safely from an access/permission perspective, but remember: (1) provide accurate traffic and baseline numbers when using the template so sample-size estimates are meaningful; (2) avoid including any personally identifiable information (PII) in examples you pass to the agent; and (3) the agent can invoke the skill autonomously by default — if…

安装（复制给龙虾 AI）

将下方整段复制到龙虾中文库对话中，由龙虾按 SKILL.md 完成安装。

请把本段交给龙虾中文库（龙虾 AI）执行：为本机安装 OpenClaw 技能「AB Test Setup」。简介：Plan A/B tests with a clear hypothesis, defined metrics, variant design, sample…。
请 fetch 以下地址读取 SKILL.md 并按文档完成安装：https://raw.githubusercontent.com/openclaw/skills/refs/heads/main/skills/amdf01-debug/sw-ab-test-setup/SKILL.md
（来源：yingzhi8.cn 技能库）

SKILL.md

打开原始 SKILL.md（GitHub raw）

# A/B Test Setup Skill

## Trigger
Plan A/B tests with proper methodology — hypothesis, sample size, duration, variant design, statistical significance.

**Trigger phrases:** "A/B test", "split test", "experiment", "test this change", "variant", "multivariate test", "hypothesis"

## Process

1. **Hypothesis**: What are you testing and why?
2. **Metrics**: Primary metric, guardrail metrics, success criteria
3. **Design**: Control vs variant(s), what exactly changes
4. **Calculate**: Sample size, test duration, minimum detectable effect
5. **Plan**: Implementation, QA, analysis timeline

## Output Format

```markdown
# A/B Test Plan: [Name]

## Hypothesis
If we [change], then [metric] will [improve/increase] because [reason].

## Variants
- **Control (A):** [current experience]
- **Variant (B):** [proposed change — be specific]

## Metrics
- **Primary:** [metric] — current: [X%] — target: [Y%]
- **Guardrail:** [metric that should NOT decrease]

## Sample Size & Duration
- MDE: [minimum detectable effect, e.g., 10% relative]
- Sample needed: [N per variant]
- Current traffic: [X visitors/day to test area]
- Estimated duration: [Y days/weeks]
- Confidence level: 95%

## Implementation Notes
[What needs to change, where, any technical considerations]

## Decision Framework
- If primary metric improves ≥ MDE with p < 0.05 → ship variant
- If no significant difference after [duration] → keep control
- If guardrail metric drops > [threshold] → stop test immediately
```

## Rules

- Never run a test without a hypothesis
- One change per test (unless multivariate with sufficient traffic)
- Run for minimum 2 full business cycles (usually 2 weeks)
- Don't peek at results daily — pre-commit to evaluation date
- 95% confidence minimum. 80% power minimum.
- Document everything: future you needs to know why this was tested