Experiment Designer — 技能 — openclaw中文资讯站

技能详情（站内镜像，无评论）

Use when planning product experiments, writing testable hypotheses, estimating sample size, prioritizing tests, or interpreting A/B outcomes with practical s...

开发与 DevOps

作者：Alireza Rezvani @alirezarezvani

许可证：MIT-0

MIT-0 ·免费使用、修改和重新分发。无需归因。

版本：v2.1.1

统计：⭐ 0 · 178 · 1 current installs · 1 all-time installs

⭐ 0

安装量（当前） 1

🛡 VirusTotal ：良性 · OpenClaw ：良性

Package：alirezarezvani/experiment-designer

安全扫描（ClawHub）

VirusTotal ：良性
OpenClaw ：良性

OpenClaw 评估

The skill's files and runtime instructions are consistent with an experiment-design helper: it contains documentation and a local sample-size script, asks for no credentials, installs nothing, and does not attempt unexpected access.

目的

Name/description (experiment design, hypothesis writing, sample-size estimation) match the included materials: two reference docs and a local sample-size calculator script. No unrelated credentials, binaries, or config paths are requested.

说明范围

SKILL.md stays on-topic (hypothesis format, metrics, sample-size estimation, ICE prioritization, stopping rules). The instructions only reference local files included in the package and show how to run the local Python script; they do not direct the agent to read unrelated files or transmit data externally.

安装机制

No install spec is present (instruction-only skill with one local script). Nothing is downloaded or extracted from external URLs and no packages are installed automatically.

证书

The skill requires no environment variables, no credentials, and no config paths. All functionality is local and proportional to the stated purpose.

持久

always is false and the skill is user-invocable. It does not request persistent system-wide changes or elevated privileges.

综合结论

This skill appears to be what it claims: documentation plus a local Python sample-size calculator. Before using: (1) review the sample_size_calculator.py to ensure its assumptions (two-proportion A/B, equal group sizes, interpretation of relative vs absolute MDE) match your experiment; (2) validate results against another calculator or statistical package when stakes are high; and (3) remember this tool does not handle sequential monitoring, m…

安装（复制给龙虾 AI）

将下方整段复制到龙虾中文库对话中，由龙虾按 SKILL.md 完成安装。

请把本段交给龙虾中文库（龙虾 AI）执行：为本机安装 OpenClaw 技能「Experiment Designer」。简介：Use when planning product experiments, writing testable hypotheses, estimating …。
请 fetch 以下地址读取 SKILL.md 并按文档完成安装：https://raw.githubusercontent.com/openclaw/skills/refs/heads/main/skills/alirezarezvani/experiment-designer/SKILL.md
（来源：yingzhi8.cn 技能库）

SKILL.md

打开原始 SKILL.md（GitHub raw）

---
name: experiment-designer
description: Use when planning product experiments, writing testable hypotheses, estimating sample size, prioritizing tests, or interpreting A/B outcomes with practical statistical rigor.
---

# Experiment Designer

Design, prioritize, and evaluate product experiments with clear hypotheses and defensible decisions.

## When To Use

Use this skill for:
- A/B and multivariate experiment planning
- Hypothesis writing and success criteria definition
- Sample size and minimum detectable effect planning
- Experiment prioritization with ICE scoring
- Reading statistical output for product decisions

## Core Workflow

1. Write hypothesis in If/Then/Because format
- If we change `[intervention]`
- Then `[metric]` will change by `[expected direction/magnitude]`
- Because `[behavioral mechanism]`

2. Define metrics before running test
- Primary metric: single decision metric
- Guardrail metrics: quality/risk protection
- Secondary metrics: diagnostics only

3. Estimate sample size
- Baseline conversion or baseline mean
- Minimum detectable effect (MDE)
- Significance level (alpha) and power

Use:
```bash
python3 scripts/sample_size_calculator.py --baseline-rate 0.12 --mde 0.02 --mde-type absolute
```

4. Prioritize experiments with ICE
- Impact: potential upside
- Confidence: evidence quality
- Ease: cost/speed/complexity

ICE Score = (Impact * Confidence * Ease) / 10

5. Launch with stopping rules
- Decide fixed sample size or fixed duration in advance
- Avoid repeated peeking without proper method
- Monitor guardrails continuously

6. Interpret results
- Statistical significance is not business significance
- Compare point estimate + confidence interval to decision threshold
- Investigate novelty effects and segment heterogeneity

## Hypothesis Quality Checklist

- [ ] Contains explicit intervention and audience
- [ ] Specifies measurable metric change
- [ ] States plausible causal reason
- [ ] Includes expected minimum effect
- [ ] Defines failure condition

## Common Experiment Pitfalls

- Underpowered tests leading to false negatives
- Running too many simultaneous changes without isolation
- Changing targeting or implementation mid-test
- Stopping early on random spikes
- Ignoring sample ratio mismatch and instrumentation drift
- Declaring success from p-value without effect-size context

## Statistical Interpretation Guardrails

- p-value < alpha indicates evidence against null, not guaranteed truth.
- Confidence interval crossing zero/no-effect means uncertain directional claim.
- Wide intervals imply low precision even when significant.
- Use practical significance thresholds tied to business impact.

See:
- `references/experiment-playbook.md`
- `references/statistics-reference.md`

## Tooling

### `scripts/sample_size_calculator.py`

Computes required sample size (per variant and total) from:
- baseline rate
- MDE (absolute or relative)
- significance level (alpha)
- statistical power

Example:
```bash
python3 scripts/sample_size_calculator.py 
  --baseline-rate 0.10 
  --mde 0.015 
  --mde-type absolute 
  --alpha 0.05 
  --power 0.8
```