# AGENTS.md — Transcribe API for AI Agents

This is a guide for AI agents and automated clients calling **usetranscribe.io**.
The same file is served at `https://www.usetranscribe.io/AGENTS.md` (and also at `/llms.txt`).

## What this service does

Turns a YouTube URL into:
- A timestamped transcript (with speaker labels when ASR is used)
- A GPT-generated summary
- A shareable permalink

Transcripts are cached. Re-requesting a URL returns the cached row instantly.

## Base URL

```
https://www.usetranscribe.io
```

Use the `www.` prefix. The apex (`https://usetranscribe.io`) currently redirects in a way that may not preserve request paths for non-browser clients — agents calling the apex can land on the homepage instead of the requested endpoint.

All endpoints are HTTP GET unless noted. No API key required today. Be polite — see rate limits below.

## Endpoints

### 1. Check if a URL is already transcribed

```
GET /api/check?platform=youtube&id={video_id}
```

`video_id` is the YouTube video ID (the 11-char string after `?v=`).

Note: cached pre-Nov-2025 transcripts for Spotify episodes are still served at `/sp/{episode_id}` for legacy share links, but new Spotify URLs are rejected at submission.

Response:
```json
{ "cached": true, "permalink": "/yt/DULfEcPR0Gc/some-slug" }
```
or
```json
{ "cached": false }
```

Use this before triggering a transcribe — saves time and quota.

### 2. Fetch a cached transcript

```
GET /yt/{video_id}?format=json
GET /sp/{episode_id}?format=json
```

Returns:
```json
{
  "platform": "youtube",
  "external_id": "DULfEcPR0Gc",
  "slug": "ai-agents-explained",
  "permalink": "https://www.usetranscribe.io/yt/DULfEcPR0Gc/ai-agents-explained",
  "title": "...",
  "creator": "...",
  "duration_seconds": 3000,
  "thumbnail_url": "...",
  "source_url": "https://www.youtube.com/watch?v=...",
  "published_at": null,
  "transcript": {
    "language": "en",
    "segments": [
      { "start": 0.0, "end": 4.2, "text": "..." },
      { "start": 4.2, "end": 8.6, "text": "...", "speaker": "Speaker 1" }
    ]
  },
  "summary": "## TL;DR\n...",
  "pipeline_version": "v1:assemblyai-best:...",
  "view_count": 12,
  "created_at": 1778968685
}
```

> **Schema notes:**
> - Segments are nested under `transcript.segments` (NOT top-level). Language is `transcript.language`.
> - The summary field is `summary` — NOT `summary_md` (that name is only used in the SSE `done` event).
> - The `speaker` field on a segment only appears when ASR with diarization was used. Captions-path transcripts (the YouTube fast-path) have no speaker field at all.

Other formats:
- `?format=md` → Plain-text markdown (paragraph-grouped, speaker-labeled when applicable)
- No `format` param → HTML permalink page (intended for humans)

### 3. Browse recently transcribed

```
GET /api/feed?limit=50&offset=0
```

Returns the homepage feed — useful for discovery / topical sampling.

```json
{ "rows": [{ "platform": "youtube", "external_id": "...", "title": "...", ... }] }
```

### 4. Trigger a new transcribe (SSE)

```
GET /transcribe?url={URL}&summarize=1
```

This is a **Server-Sent Events** stream. The connection stays open for the full job (typically 1-5 minutes; rule of thumb: ~1 minute per 15 minutes of source audio).

Event sequence:
1. `event: stage` — progress markers (`validating`, `resolving`, `transcribing`, `summarizing`)
2. `event: meta` — once, after metadata is resolved (title, duration, thumbnail)
3. `event: done` — final payload (full transcript + summary + permalink)
4. `event: error` — on failure (codes: `unsupported_url`, `too_long`, `auth_required`, `proxy_unavailable`, `metadata_failed`, `transcription_failed`)

`done` payload:
```json
{
  "permalink": "https://www.usetranscribe.io/yt/abc123/some-slug",
  "segments": [...],
  "summary_md": "...",
  "metadata": { "title": "...", "duration_seconds": 3000, ... },
  "language": "en",
  "source": "captions" | "audio_url" | "audio_file"
}
```

> **Note on `permalink` shape**: `/api/check` returns `permalink` as a **path** (e.g. `/yt/abc/slug`). The SSE `done` event returns `permalink` as a **full absolute URL**. Always check whether the value starts with `http`; don't blindly concatenate with the base URL.

**Recommended pattern**: check `/api/check` first; only call `/transcribe` on a cache miss.

### 5. Ask a question about a transcript (SSE)

```
POST /yt/{video_id}/ask
POST /sp/{episode_id}/ask
```

Stream an answer to a free-form question about a transcript. The model has the full transcript in context and returns timestamp citations as `[M:SS]` (e.g. `[2:14]`) that reference moments in the transcript.

**Request body** (JSON):

```json
{
  "question": "What is the main argument?",
  "thread": [
    {"role": "user",      "content": "Prior question"},
    {"role": "assistant", "content": "Prior answer"}
  ]
}
```

- `question` (required, string, ≤500 chars) — the question to ask.
- `thread` (optional, default `[]`) — prior conversation turns for follow-up questions. Each turn has `role` (`"user"` or `"assistant"`) and `content` (string). Per-turn content auto-truncated to 2,000 chars server-side; only the last 10 turns are kept. Raw total payload cap: 200,000 chars.

**Response** — `text/event-stream`:

```
data: {"type": "token", "text": "The "}
data: {"type": "token", "text": "speaker "}
data: {"type": "token", "text": "argues "}
data: {"type": "token", "text": "that "}
data: {"type": "token", "text": "[2:14]"}
...
data: {"type": "done"}
```

On error: `data: {"type": "error", "message": "..."}` then the stream closes.

**Status codes:**
- `200` — SSE stream
- `400` — invalid JSON body, question missing/blank/too long, or malformed thread
- `404` — transcript not found
- `410` — transcript blocked
- `429` — Q&A quota exhausted (separate from transcription quota — see Rate limits)

### 6. Other formats on a permalink

```
GET /yt/{video_id}/transcript.pdf   # PDF download
GET /yt/{video_id}/og.png           # Open Graph card image
```

## Minimal Python example

```python
import httpx, json

BASE = "https://www.usetranscribe.io"
yt_url = "https://www.youtube.com/watch?v=DULfEcPR0Gc"
video_id = "DULfEcPR0Gc"


def absolutize(permalink: str) -> str:
    """`/api/check` returns a path, `done` event returns a full URL — handle both."""
    return permalink if permalink.startswith("http") else f"{BASE}{permalink}"


# 1. Cache check
r = httpx.get(f"{BASE}/api/check", params={"platform": "youtube", "id": video_id})
if r.json()["cached"]:
    url = absolutize(r.json()["permalink"])
    data = httpx.get(url, params={"format": "json"}).json()
    # Cached JSON shape: summary is `summary`, segments are nested under transcript.
    print(data["summary"])
    print(f"{len(data['transcript']['segments'])} segments")
else:
    # 2. Trigger transcribe, parse SSE
    with httpx.stream("GET", f"{BASE}/transcribe",
                      params={"url": yt_url, "summarize": 1},
                      timeout=600) as s:
        event = None
        for line in s.iter_lines():
            if line.startswith("event:"):
                event = line.split(":", 1)[1].strip()
            elif line.startswith("data:") and event == "done":
                data = json.loads(line.split(":", 1)[1].strip())
                # SSE `done` shape differs from cached read: summary is
                # `summary_md`, segments are top-level (not nested).
                print("Permalink:", absolutize(data["permalink"]))
                print(data["summary_md"])
                print(f"{len(data['segments'])} segments")
                break
            elif line.startswith("data:") and event == "error":
                err = json.loads(line.split(":", 1)[1].strip())
                raise RuntimeError(f"{err['code']}: {err['message']}")
```

## Rate limits

Enforced per source IP and per session cookie:

| Limit | Value |
|---|---|
| Concurrent transcribes per IP | 2 |
| Daily transcribes per IP | 50 |
| Daily transcribes per session | 50 |
| Daily Q&A questions per IP | 50 |

Exceeded limits return HTTP 429 with `{"error": "rate_limit", "scope": "ip|ip_concurrent|session|qa_ip"}`. Q&A questions use a separate `qa_ip` counter — asking questions does not consume transcription quota and vice versa.

**Hosted agents take note**: many users behind one egress IP share the same daily budget. There is no API key tier today.

## Constraints

- **Max source duration**: 90 minutes. Longer content returns `error: too_long`.
- **Supported platforms**: YouTube only. Spotify support was removed in Nov 2025; cached Spotify rows remain accessible at `/sp/{episode_id}` for legacy share links, but new Spotify URLs are rejected. X (Twitter) was previously listed but is not supported either.
- **Speaker labels**: enabled automatically when ASR is used (YouTube fallback when captions are unavailable). The YouTube fast-path uses platform captions which carry no diarization — the permalink page exposes an "Add speaker labels" button for one-way upgrade.
- **Language**: ASR is English-tuned; non-English content may degrade.

## Etiquette

- Always cache-check before transcribing.
- Don't retry on `too_long`, `unsupported_url`, or `auth_required` — these won't change.
- On `metadata_failed` or `transcription_failed`, back off exponentially; transient datacenter-IP throttling does happen.
- Identify your agent with a descriptive `User-Agent` header if possible.

## Questions / issues

This is a hobby project run by gokulr@gmail.com. No SLA. Open an issue at the GitHub repo if you find a bug.