Table of Contents
The Itch
Slash commands are explicit invocation. Subagents are delegation. Skills are the third thing: dormant capabilities that wake up when the conversation calls for them. The frontmatter description is a trigger; the rest of the directory is whatever Claude needs to fulfill it.
Drop a skill under ~/.claude/skills/<name> and Claude Code picks it up across every project. Drop it under <project>/.claude/skills/<name> and it's scoped to that one repo. No config file edits, no plugin registry. The directory is the unit.
This post walks two skills I use every day. One turns any content into spoken audio. The other lets me drive coding sessions from my phone over Slack or iMessage when I'm away from the keyboard. Both are open source. Both fit in one directory. Both teach a pattern you can lift.
What a Skill Actually Is
A skill is a directory containing a SKILL.md with YAML frontmatter, plus any scripts or reference docs it needs to execute.
Claude does not read every skill body on every turn. It reads only the frontmatter (name + description) and decides whether the user's request matches. If it does, the body of SKILL.md loads in, along with any partials it points at. The scripts in the directory are now on PATH (so to speak) for the model to invoke via Bash.
The discipline: a skill's description is its API. If it's vague, it never fires. If it's precise, it fires at the right moments.
The Anatomy of SKILL.md
Here's the frontmatter from /tts:
---
name: tts
description: Convert any content (file, URL, text, image, PDF) into a
spoken audio narration. Defaults to local Kokoro (free, no API key,
no watermark); pass `--gemini` to use the Gemini 2.5 Flash TTS API.
Use when the user says "read this to me", "tts", "text to speech",
"explain this out loud", "make an audio version", or wants to listen
to content rather than read it. Accepts file paths, URLs, raw text,
images, or any combination. Flags: `--voices` prints the catalog
and exits; `--gemini` switches backends.
---Three things to notice. First, the description names the trigger phrases verbatim ("read this to me", "tts", "text to speech"). Claude doesn't do fuzzy paraphrase matching well; explicit is better. Second, it lists the inputs it accepts so Claude knows it can pass a URL or a PDF, not just a string. Third, it documents the flags inline so Claude doesn't have to read the body to know --gemini exists.
Treat the description like an interface. Every phrase that should match the user's intent goes in quotes. Every flag goes in backticks. Every input type gets named. If you have to think about whether Claude will pick the skill up, the description is too vague.
The body of SKILL.md is whatever instructions Claude needs to actually execute the workflow once the description has fired. For /tts it's a four-phase plan: preflight, extract content, rewrite for spoken delivery, synthesize. The body can include partials via @path/to/file.md so you don't blow the context window on reference material that's only needed in specific paths.
Case Study 1: /tts (Content to Audio)
/tts takes any input and produces spoken narration. The interesting part isn't the audio synthesis (Kokoro and Gemini do that). It's Phase 2, where Claude rewrites the source into something that actually sounds like a person.
The Four Phases
Why split synthesis from rewriting? Because raw markdown is unlistenable. Bullet points become "dot, dot, dot." URLs get read character by character. Code blocks turn into syntactic mush. The Phase 2 rewrite strips all of that and produces prose that flows when spoken: explicit transitions, expanded jargon, summaries instead of exact paths. About 150 words per minute, written for the ear.
Local First, Cloud as Fallback
Default backend is Kokoro: an 82M-parameter open-weights TTS model that runs locally via uv. First run pulls about 340MB of wheels and weights; every run after that is fully offline. MIT license, no watermark, no audible "I am an AI" disclaimer. Cost: zero.
Pass --gemini and the dispatcher routes through Google's 2.5 Flash TTS instead. Useful when uv isn't installed, when you want a specific Gemini voice, or for very long content where end-to-end synthesis time matters more than privacy.
# Default: local Kokoro
node "$SKILL_DIR/tts.js" /tmp/tts-script.txt /tmp/tts-output.wav --play
# Cloud fallback
GOOGLE_API_KEY="..." node "$SKILL_DIR/tts.js" \
/tmp/tts-script.txt /tmp/tts-output.wav --play --gemini
# Voice catalog
node "$SKILL_DIR/tts.js" --voices # Kokoro
node "$SKILL_DIR/tts.js" --voices --gemini # GeminiStreaming the Audio
With --play the dispatcher streams audio chunk-by-chunk through afplay while later chunks are still synthesizing. First sound usually arrives within five seconds even on the Kokoro path. The combined WAV is also written to disk so you can replay later without re-synthesizing.
The Kokoro path delegates to a self-bootstrapping uv script. The whole Python helper looks like this on disk:
#!/usr/bin/env -S uv run --quiet --script
# /// script
# requires-python = ">=3.10,<3.13"
# dependencies = [
# "kokoro>=0.9",
# "soundfile",
# "numpy",
# ]
# ///No requirements.txt, no virtualenv setup steps in the README, no pre-install instructions. The shebang declares its own dependencies via PEP 723 inline metadata; uv resolves them on first run and caches them. The skill ships as exactly two executable files (tts.js + tts_kokoro.py) plus three docs.
Case Study 2: /afk (Session Protocol)
Where /tts is a one-shot transform, /afk is a long-running stateful protocol. You tell Claude where you're going (/loop /afk ship the routing fix), Claude posts a Slack thread, and you drive the rest from your phone. Milestones get posted, questions get asked, replies get parsed. No babysitting.
The Core Idea
One session equals one thread on the remote channel. The thread root is the session header. Every status update, question, and reply lives inside that thread. A local state file ($PWD/.afk/session.json) pins this session to its thread, so re-entering /afk from the same worktree resumes the same conversation instead of starting a new one.
The Sentinel Prefix
Slack's MCP plugin posts as the user, so sender metadata can't distinguish Claude's posts from yours. The skill solves this with a sentinel prefix: every outbound post starts with 🤖 . Anything in the thread without that prefix is treated as user input.
This sounds trivial. It is also load-bearing. Without the sentinel, Claude reads its own posts back as user replies on the next wake and the loop eats itself. iMessage has the same problem more severely: if your Mac and iPhone are on the same Apple ID, every message in chat.db shows up as is_from_me=1. The sentinel is the only reliable filter.
Polling via /loop and ScheduleWakeup
Skills can't poll on their own. The wait-for-reply loop relies on /loop dynamic mode plus the ScheduleWakeup tool, which schedules a re-entry into the conversation at a chosen delay (clamped to 60s minimum, 3600s maximum). The intended invocation is:
/loop /afk # resume or inherit context
/loop /afk <task> # explicit task
/loop /afk --transport=imessage <task> # SMS/iMessage insteadThe /loop harness re-enters the skill on each wake. The skill just does the current tick: read the thread since last_seen_ts, act on any new input, post a milestone if there's progress, decide on the next delay via ScheduleWakeup. Backoff on empty wakes goes 60s, 90s, 270s, 900s, 1800s, deliberately skipping the 300-to-900s prompt-cache dead zone (Anthropic's prompt cache has a 5-minute TTL, so 300s is the worst of both worlds: you pay the cache miss without amortizing it).
The State File
{
"transport": "slack",
"thread_ts": "1777572000.001100",
"channel_id": "D01234ABCDE",
"task": "ship the routing fix",
"label": "myapp · routing-fix",
"started_at": "2026-04-30T18:00:00Z",
"last_seen_ts": "1777572123.002200",
"empty_wake_streak": 0,
"monitor_task_id": null,
"status": "active"
}The state file is in the worktree, not in ~. That's the multi-session story: you can have ten cmux worktrees each running /afk in parallel, each one with its own thread and its own state, and Claude triages whichever thread you reply to next without confusion. The thread_ts pinned in the state file is authoritative for that worktree, full stop.
The interesting design choice is not "post to Slack." It's the no-silent-tick rule. If you wake up, do nothing user-visible, and go back to sleep three times in a row, the user's phone screen looks like the session died. The skill spends every wait window doing parallel work it can ship on the next tick, and posts a heartbeat at long delays so the thread shows it's still alive.
Patterns Worth Stealing
Both skills are small (one directory, fewer than a dozen files). Both are doing more than they look like they're doing. Four patterns transfer.
The Description Is the API
Both skills name their trigger phrases verbatim. /tts lists "read this to me", "explain this out loud", "make an audio version". /afk lists "going afk", "ping me on slack when", "I'll be on my phone". If a phrase exists in your description, it fires. If it doesn't, it doesn't. Don't hope the model paraphrases.
Self-Bootstrapping Scripts
/tts's Python helper declares its dependencies inline via PEP 723. No README install steps. No virtualenv. The skill ships as a flat directory and the script handles its own environment. This is the right ergonomics target: a user clones the repo into ~/.claude/skills/<name> and it works.
Local State, Worktree-Scoped
/afk stores session state in $PWD/.afk/session.json, not in ~. That's how parallel sessions across worktrees stay isolated. If you're building a skill that needs to remember anything across re-entries, prefer per-worktree state files. Append .afk/ (or your equivalent) to the project's .gitignore on first run so the receipt doesn't leak into commits.
Local-First, Cloud-Fallback Backends
/tts's Kokoro/Gemini split is the template. Default to the option that respects the user's privacy and wallet. Provide a flag to switch to the cloud option when local prerequisites aren't there. Document the tradeoff in the skill body so Claude can explain it if asked.
Try It
Both skills are MIT-licensed. /tts is a single git clone; /afk adds a small Slack or iMessage config file.
# User-scoped (recommended)
git clone https://github.com/wickdninja/tts.git ~/.claude/skills/tts
# Project-scoped
git clone https://github.com/wickdninja/tts.git \
<project>/.claude/skills/tts
# Then in Claude Code:
# /tts ./README.md
# /tts https://example.com/post
# /tts --gemini ./long-doc.md
# /tts --voices# User-scoped
git clone https://github.com/wickdninja/afk.git ~/.claude/skills/afk
# Configure Slack (or iMessage)
mkdir -p ~/.claude/afk
cat > ~/.claude/afk/slack.json <<'JSON'
{
"user_id": "U01234ABCDE",
"channel_id": "D01234ABCDE",
"workspace_host": "yourworkspace.slack.com"
}
JSON
# Then in Claude Code:
# /loop /afk ship the routing fix
# /loop /afk --transport=imessage ship it/tts needs node and uv for the default Kokoro path: brew install uv node. Or skip uv and use --gemini with a GOOGLE_API_KEY.
/afk needs the Slack MCP plugin authenticated for the default transport, or macOS Full Disk Access plus Automation permission for the iMessage transport.
Resources
The two skills, the platform docs, and the upstream pieces they depend on.
The Skills
Convert any content into spoken audio. Local Kokoro by default; Gemini 2.5 Flash TTS as cloud fallback. MIT.
Drive Claude Code sessions from your phone via Slack or iMessage. One thread per session, sentinel-based disambiguation, /loop-powered polling. MIT.
Claude Code Platform
Official documentation for skill structure, frontmatter, partials, and the loading model.
The CLI itself. Skills only do anything inside a Claude Code session.
If you want to build an agent that uses skills programmatically rather than through the CLI.
Upstream Dependencies
The MIT-licensed open-weights TTS model that powers /tts's local backend.
Astral's Python package and project manager. PEP 723 inline metadata is what makes /tts's Python helper self-bootstrapping.
Google's speech synthesis API. The cloud fallback for /tts.
The Slack MCP plugin /afk uses for its default transport.
