Dev log

What's new on the board

Chalkboard is in closed beta. The hosted pipeline runs at chalkboard.studio, invite-only for now. If you'd like in, drop your email on the waitlist. If you'd rather not wait, the open-source repo runs the same pipeline on your own machine. Entries are grouped by week up through the hosted launch; from v0.1.2 onward, each release gets its own tagged entry. For the commit-level view, see GitHub.

nothing here is stable yet. APIs, flags, and on-disk layouts still move between entries.

v0.2.0
May 29, 2026
"credits & control"
Latest on main

The stretch that turned the pipeline into a product. Chalkboard now runs on credits: you see the cost before you submit, spend is metered transparently, and your balance is yours to manage. There are premium narrator voices, a noticeably sharper visual style on every video, an upgrade to the latest Claude, and real account and data controls (export everything, delete everything). The private beta is next.

  • Credits. Every generation now costs credits, and the cost is shown up front. /generate estimates a live range before you submit, with a breakdown of what drives it. Your balance is reserved when a run starts and reconciled when it finishes, and a run won't begin if you can't cover it, so nothing fails halfway with the meter already running. The API enforces the same check (a clean 402 before any spend if you're short). See /pricing for the model.
  • Premium narrator voices. A default voice plus four premium, natural-sounding narrators, on /generate and the API (narrator=). The cost line updates to reflect the choice.
  • Promo codes. Redeem a code for credits at /redeem. This is how invites and launch credits get handed out during the beta.
  • Account & data controls. Manage all your email notifications in one place, set your own spend-alert threshold, export everything we hold on you, and delete your account (cascading across your videos, jobs, and history). Password resets and every notification now come from Chalkboard's own domain, in the site's voice.
  • A sharper house style. A reusable, code-backed design system now drives every video, with consistent color, type, spacing, motion, and layout, plus guardrails that catch overflowing or colliding elements before a scene renders. Videos look more deliberate and less generated.
  • Upgraded to Claude Opus 4.8, the newest and strongest model, across the generation pipeline. Sharper scripts and code, better first-pass results.
  • Much more accurate cost and time estimates. We measured real runs end to end and recalibrated every estimate (and the credit reservation) to match, including the render step itself, not just the AI. The number you see before submitting is now a close, slightly conservative upper bound.
  • Refreshed create page and navigation. /generate and the logged-in nav got a cleaner, more compact redesign.
  • Status reads in plain language. The health chip and /status now describe capabilities (Generation, Narration, Delivery) instead of naming providers, and only surface incidents that actually affect Chalkboard.
  • Long, dense topics are robust. Very long prompts no longer overrun the script step; if something does go wrong late in a run you get a clear failure (and an email) instead of a silent stall, and any spend is still accounted for.
  • Fewer wasted retries. The automated code review is now advisory rather than blocking, so a run isn't sent in circles re-doing a scene that was already fine.
  • Downloads always download. Every download button reliably saves the file instead of opening it in the tab.
  • Video-page tidy-up. The re-render control only shows when it's actually useful, and a dead button was removed.
v0.1.3
Apr 29, 2026
"drawing pipeline overhaul"

A push on what shows up on screen. The animation generator now picks the right Claude model per step, edits surgically instead of regenerating from scratch, and runs every scene through a layout-check pass before committing to a full render. You'll see fewer cluttered scenes, fewer text collisions, and far fewer "the visual is wrong but the audio is right" mismatches. There's also a "Report a visual issue" button on every video so you can flag what slips through.

  • Maximum reliability dial on /generate and the API (reliability=max). Promotes every quality-sensitive agent (script, fact-check, Manim, code review, visual QA) to Opus 4.7 regardless of effort tier. Use it when you're producing reference material or shipping a demo and want the floor as high as it goes.
  • Per-agent model selection. Different Claude models now run different stages. Manim code generation always lands on Opus 4.7 (the highest-leverage call), validators on Sonnet 4.6, research on Sonnet. The model form field still works as a global override when you want to force a single model across all agents.
  • Report a visual issue on every /library/{run_id} page. One-click submit a description (and the segment number that broke) and it lands in an admin triage queue. Confirmed reports are added to a regression set so the same bug class is automatically re-tested on every nightly run.
  • Surgical-edit regeneration. When the validator flags a Manim code issue, the regen prompt now passes the prior code in and instructs minimal targeted changes, instead of regenerating from scratch and discarding everything that worked. Fewer compounding mistakes, more first-pass-correct retries.
  • Scene layout check service. Every scene now runs through a dry-run pass in a dedicated headless validator service before we commit to the full Manim render. Bounding-box overlaps, off-screen elements, and timing overruns get caught up front instead of surfacing as "the render finished but it looks wrong" 90 seconds later.
  • Bigger AST guard library. Nine deterministic syntax-tree checks now catch known Manim-API misuse patterns (Brace.get_text font-size, .animate.copy().next_to, hardcoded wait durations, zone-overflow array layouts, …) before the scene is even sent to the validator. Cheaper than a Claude call and instant.
  • Pipeline telemetry. Every Claude call inside the pipeline now emits a structured event (run_id, agent, model, attempt, outcome, duration). Backs analytics queries we'll surface in admin over time: pass-rate per agent/model, regen rounds by reliability tier, layout-violation distribution, cost-per-run breakdowns.
  • Visual QA always runs on real user submissions. The qa_density=zero setting is now reserved for the regression suite (cost containment); user runs silently coerce to normal so no one lands on a render that skipped the visual review pass.
  • Bot-replay protection. Every signed-in API call from the web UI now carries an app-attestation token in addition to the auth header, so bot scripts can't replay session tokens against the API. No user-visible change; existing keys and SDK clients are untouched.
  • Status page covers more of the stack. The chip in the navbar now reflects every upstream Chalkboard leans on (generation, narration, delivery, hosting, and bot protection), filtered to incidents that actually affect Chalkboard, so unrelated regional blips don't trip it.
  • File uploads now reach the renderer. When you attached files to a generation request, they were saved on the machine that received the request, which the part of the pipeline that runs the job couldn't see. Uploads now go through shared storage that both ends read from.
  • Manim no longer silently downgraded. The model resolver was reading a per-event UI field as the user-supplied override, which let the first agent's resolved model pin every later agent. On low/medium runs this meant Manim ran on Sonnet despite the configuration specifying Opus, so output was visibly worse. Renamed the override key so the collision can't recur.
  • Test-mode model is now Sonnet, not Haiku. The test-mode coercion (chk_test_ API keys) used to force Haiku, which produced unreliable code on Manim scenes. Now coerces to Sonnet 4.6; cost containment comes from effort=low / quality=low / qa_density=zero rather than from a too-cheap model.
v0.1.2
Apr 25, 2026
"public API"

Chalkboard now has a programmatic API. Long-lived API keys, idempotent retries, async webhooks, a versioned /api/v1 surface, and a Python SDK. Same pipeline you'd hit through the web UI, now callable from a script, a CI job, or any other system.

  • API keys: create / list / revoke from /account. chk_live_… for production, chk_test_… for cheap CI runs (forces low effort + Haiku, ≈ 9 credits/run vs ≈ 36 credits live). Pass them in Authorization: Bearer <key>; expiries 30d / 90d / 1y / never.
  • Versioned API surface at /api/v1/.... Existing /api/... routes keep working but now stamp Deprecation: true + Link: …; rel="successor-version" headers so well-behaved clients know to migrate.
  • Idempotency-Key support on POST /api/v1/jobs + /jobs/{id}/retry + /jobs/{id}/rerender. Stripe-style: same key + same body within 24h replays the cached response; same key + different body returns 409. Safe retries on transient failures, no duplicate paid jobs.
  • Webhooks for terminal events (job.completed / job.failed / job.cancelled). Async delivery with exponential backoff (1s → 30s → 5m → 30m → 2h → 12h, then auto-disable). Stripe-style HMAC-SHA256 signature in X-Chalkboard-Signature: t=<unix>,v1=<sha>.
  • Public API documentation at /docs/api: anonymous-readable, persistent version strip, copy-paste signature verifier, full endpoint reference. Homepage's first "Get started" tile now leads with the API; the self-hosted route stays linked from the homepage's get-started section.
  • Python SDK in sdk/python/: a typed sync client, no codegen, single dep (httpx). The four canonical flows (create+poll, create+stream, library, lifecycle) plus a verify_webhook_signature helper. Will ship via GitHub Releases for the closed-beta period.
  • Account dialogs centered + styled. Both the "Create API key" and "Add webhook" dialogs now use a shared cb-dialog class with explicit position:fixed; transform:translate(-50%, -50%) so they center reliably across Chrome / Firefox / Safari instead of relying on the implementation-defined <dialog>.showModal() default. The webhook dialog also picks up the same form-input styling that was missing in v0.1.1.
v0.1.1
Apr 24, 2026
"first private beta"

First tagged release after the hosted launch. New ways to start a job: drag-drop files, paste any link, pick the Claude model. A faster post-render flow with re-render from the same script, account self-service, and email-when-done. And a homepage you can browse without signing in.

  • Homepage video previews: the four "Made with Chalkboard" tiles now open an in-page player on click, so anon visitors can watch the full video without signing in. Modifier-click (Cmd / Ctrl) still opens /library/{run_id} in a new tab.
  • Re-render button on the video page: regenerate the video from the same script + voiceover with one click. Skips research, scripting, and TTS and re-enters the graph at the Manim stage, typically 3 to 5 times faster than a full re-run.
  • Drag-and-drop file upload on the generate form: drop PDFs, images, code, .docx, or .txt as source material; files attach to the job's context window.
  • Claude model selector on the generate form: Opus 4.7, Sonnet 4.6, or Haiku 4.5. The cost estimate scales with the choice (Opus 5×, Sonnet 1×, Haiku 0.27×).
  • Live tokens + cost on /progress: Claude token count + live spend stream in over SSE so you watch the run cost as it materializes instead of after the fact.
  • "Email me when it's done": opt-in completion email, only sent when the render actually succeeds.
  • /account page: email, sign-out, and full account delete (cascades across videos, jobs, spend log, and your stored renders).
  • "Report an issue" link on every /progress run, opening a dialog that sends straight to hello@chalkboard.studio with the run_id.
  • Unified Links field on the generate form: paste GitHub repos and arbitrary URLs into one input; the classifier routes owner/repo shapes to GitHub fetches and the rest to plain webpage fetches.
  • Per-segment transcript on the video page with clickable [MM:SS] timestamps that seek the player.
  • Model + narrator + tokens + cost stat-rows on the video detail page. Every "soon" placeholder from launch is now live data.
  • Topic cap raised to 32,000 characters on the generate form, with a live character counter.
  • Site-themed transactional emails: waitlist, invite, completion, and admin messages now use the dark chalkboard aesthetic with bulletproof CTAs.
  • Homepage tiles served the wrong videos: clicking Hash Tables played the NoSQL video; all four tiles were shuffled. Each tile now plays the video its label promises.
  • "Make your own" in the preview modal now closes the modal and smooth-scrolls to Get Started instead of leaving the dialog open over the destination.
  • Completion emails were silently dropped for several days post-launch. They are now reliably sent when a render finishes.
  • /progress stuck on "Generating" after a successful run: an explicit completion banner with a "View video →" CTA replaces the live feed once the job finishes.
  • SSE replay storm on terminal-page reloads was double-counting tokens and re-activating render stages; replay now stops once the run is done.
  • Report-an-issue modal centered correctly across Chrome / Firefox / Safari (was top-pinning or corner-anchoring depending on UA).
  • Topic echo on /progress clamps to two lines with a hover-tooltip for the full string instead of overflowing on long prompts.
  • "Get started" CTAs for already-signed-in users now lead to /library instead of restarting the signup flow.
  • Long-stuck "running" jobs auto-finalize via a periodic reaper so the library no longer accumulates phantom in-flight rows after a worker crash.
  • Cancel propagates from the API → worker so clicking Cancel on /progress actually stops the in-flight render.
  • Spend tracking survives a job failure: partial token spend flushes even when the pipeline aborts.
  • Invite-code carry-over on Google sign-up: the email-form invite no longer disappears when the visitor switches to Google.
  • Versioning unified behind a single config.APP_VERSION + /api/meta endpoint. Every footer + nav badge reads from the same source so they can't drift again. The nav now shows a "What's new" badge when there's a release the visitor hasn't seen.
  • Effort wording clarified on the generate form: low / medium / high now describe what each does at runtime instead of just labeling.
  • Pipeline stations consolidated on /progress: the Planner + Writer steps merged into one "Script" station to match the underlying graph.
Apr 13 – 19, 2026
"hosted launch"

Chalkboard is live at chalkboard.studio. Four new app pages (generate, progress, library, video) replaced the old single-page UI; auth gates every page with Google sign-in, email/password, and invite codes.

  • Hosted launch: chalkboard.studio is live, on managed cloud hosting with auto-provisioned TLS. Privacy-friendly web analytics on all pages.
  • Auth: login.html with Google Sign-In, email/password, and invite-code gate. Every app page redirects to login if unauthenticated; a shared auth module provides requireAuth() and authHeaders() for all API calls.
  • Generate page: rebuilt with tone/palette tag-chips, pace slider (unhurried 0.85x / steady 1.0x / brisk 1.15x), captions selector, and all fields wired to real CreateJobRequest fields (tone, theme, speed, burn_captions).
  • Progress page: live pipeline station updates over SSE, scrollable peek log, placeholder content cleared immediately after auth so the page never shows stale skeleton state.
  • Library page: real API-backed video grid; filter chips wired to ?status= param; date-grouped headers from JS; hardcoded placeholder cards removed.
  • Video detail page: player + transcript + download links, all auth-guarded.
  • Nav user dropdown: "Logged in as <name>" with a Sign Out button. Click-to-open instead of sign-out-on-click, consistent across all app pages.
  • Invite-code waitlist endpoint (POST /waitlist): stores email + invite code, validates the code, returns 200/401/409.
  • Library filter chips no longer show hardcoded counts (All 47 / Done 43 / …). Counts and category tags were placeholder-only and are now removed until the API exposes them.
  • Generate form captions default changed from "SRT + burned-in" to "SRT file only" so burn is opt-in.
  • Broken @keyframes nav-news-pulse block: a Python regex over-matched the closing brace, inserting nav-user CSS inside the keyframe rule (silently discarded by browsers). Fixed with a pattern that captures the full two-stop keyframe block before inserting after it.
  • Progress page peek scrollbox now has a fixed height instead of min-height, preventing it from expanding infinitely on long runs.
  • Docs API page (/docs/api): stylesheet referenced as ./docs.css (relative, so it resolved to /docs/docs.css, 404). Fixed to absolute /docs.css.
  • Stale improved.html links in nav across docs pages replaced with /guide.
Apr 10 – 12, 2026
"web-UI-first docs"

Rebuilt the marketing + docs site around the web UI instead of the CLI. Real screenshots, a 24-second timelapse of the pipeline, and a clickable lightbox on every image.

  • New docs pages: guide.html (web-UI-first walkthrough with screenshots), cli.html (flag reference, context injection, env vars), and api.html (HTTP + Python API reference with sticky ToC).
  • Clickable screenshot lightbox across the whole site, including the progress timelapse video.
  • Progress-pipeline timelapse (progress.mp4, 24s) autoplaying on the homepage and guide.
  • Homepage quickstart now lists the web UI before the CLI; pipeline diagram simplified to topic → script → fact-check → animation → validate → video.
  • Positioning softened from "self-hosted web app" to "multi-agent pipeline", neutral wording that works for the hosted version when it ships.
  • Real web-UI screenshots replace the placeholder PNGs; EXIF/ICC metadata stripped; library shot recompressed 1.2 MB → 557 KB.
Apr 7 – 9, 2026
"library & layout checker"

Generated videos live in a real library now. The pipeline gained a pre-render layout checker that runs a headless Manim dry-run to catch off-screen and overlapping elements before we waste a full render. And there's a Claude API status dot in the nav.

  • Video library: SQLite-backed library with 4-column grid, search, sort, numbered pagination, per-page selector, per-card three-dot menu (Open / Delete), and a detail page with interactive transcript that seeks the player on click.
  • Layout checker node: ChalkboardSceneBase with per-segment bounding-box, off-screen, and timing validation. Wired between code_validator and render_trigger; failures route back to the Manim agent with a formatted violation report.
  • Job status overhaul: always-visible activity icon in the nav with a dropdown of in-progress + recent jobs, replacing the single-job pill. Multi-job localStorage, animated badge dot, per-job dismiss.
  • Claude API status dot: /api/claude-status parses status.claude.com's RSS (cached 5 min) and shows green/yellow/red in the nav with the active incident on hover.
  • New templates: howto (numbered step list with progressive reveal) and timeline (horizontal axis with dated markers).
  • AI-generated video titles: script_agent now produces a concise YouTube-style title alongside the script, flowing through manifest.json into the library.
  • Subtitle track + SRT captions served on the video detail page; download includes captions.srt.
  • Watchfiles reloader no longer restarts the server on pipeline writes to output/ or *.db, which was killing in-flight jobs.
  • Multi-column tables no longer overflow zone boundaries: the manim_agent prompt now includes a bounding-box rule (right_edge = x_0 + (N - 0.5) * W).
  • Layout-checker timing tracker no longer double-counts wait() duration (Manim's wait() internally calls play(Wait(...)), causing phantom +1s per wait).
Apr 3 – 6, 2026
"the web UI"

The biggest week so far. Chalkboard grew a FastAPI server and a single-page web UI with live pipeline progress over SSE, file upload with per-type size caps, and a redesigned form split into Source Material / Style / Output sections.

  • FastAPI server (server/): POST /api/jobs, POST /api/jobs/upload (multipart), GET /api/jobs/{id}/events (SSE), job file serving with path-traversal guard. Configurable port via SERVER_PORT.
  • Web frontend: vanilla index.html with job form, live SSE progress stages, and a video player + download links on completion.
  • File upload zone with drag-and-drop (folders too), per-type caps (text 2 MB · image 5 MB · PDF 20 MB · DOCX 10 MB), 24 MB total. Accepts .bat/.ps1 too.
  • Advanced form sections: Source Material (files / URLs / GitHub repos), Style (tone, theme, template), Output (speed, QA density, burn captions, quiz).
  • Help tooltips (?) on every form field and section heading.
  • Favicon, privacy-friendly web analytics, search-console verification.
  • --effort high web search: research_agent and script_agent now scan reversed(response.content) for the text block (the web_search tool inserts ServerToolUseBlocks first). Graceful TimeoutExhausted fallback so the pipeline continues on training data if search fails.
  • config.py loads .env so the server picks up TTS_BACKEND and other vars when started via uvicorn.
  • Pipeline failures now surface in job.error instead of dying silently; render is guarded on manifest.json existing.
Mar 29 – Apr 2, 2026
"context, timeouts, templates"

Context injection arrived in three flavours: local files, URLs, and GitHub repos. A timeout + retry layer now wraps every Claude call and the Docker render. The research agent (web search via Anthropic's web_search_20250305 tool) is wired in on high effort. Plus --template, --speed, chapter markers, and SRT captions.

  • --context, --url, --github owner/repo flags. pipeline/context.py collects files (gitignore-aware via pathspec) or fetches URLs (httpx + beautifulsoup4, 100k-char cap). --yes skips the large-context confirmation for non-interactive runs.
  • --template algorithm|code|compare|howto|timeline injects layout + visual conventions into the Manim codegen prompt.
  • --speed 0.25..4.0 with native support on OpenAI TTS, and an ffmpeg atempo chain for Kokoro/ElevenLabs.
  • Always-on chapter markers (FFMETADATA1 atoms in final.mp4, plus stdout list) and captions.srt. --burn-captions re-encodes with the subtitles filter.
  • --qa-density zero|normal|high for visual QA sampling rate (1 frame per 30s normal, 1 per 15s high, up to 10/20 frames respectively).
  • research_agent node: runs on effort_level=high, produces a structured brief + sources that script_agent consumes (and skips its own web_search when the brief is present).
  • pipeline/retry.py with TimeoutExhausted and api_call_with_retry; per-agent timeouts. Docker render has adaptive timeout + retry loop with RenderFailed escalation.
  • Manim codegen pitfalls: DASHED isn't a v0.20.1 constant (use DashedLine); pointer-label stacks over a descriptive line need buff≥0.85; obj.copy().next_to() inside .animate captures position at animation start, so use absolute offsets.
  • Segment-accurate A/V sync: generated scenes load per-segment durations from segments.json at render time instead of using hardcoded estimates. code_validator rejects scenes with hardcoded self.wait(float).
  • OpenAI TTS duration: compute from actual PCM bytes, not the WAV header (OpenAI returns nframes=INT_MAX, which made Manim render multi-day videos).
Mar 25 – 28, 2026
"first light"

The initial pipeline ships. Topic goes in, narrated Manim video comes out, in one command.

  • LangGraph pipeline: script_agentfact_validatormanim_agentcode_validatorrender_trigger, with retry loops and escalate_to_user for unrecoverable failures.
  • TTS backends: Kokoro (local, default), OpenAI, ElevenLabs, swappable via the TTS_BACKEND env var.
  • Single-command render: main.py orchestrates the pipeline, builds the Docker image on first run, runs Manim in the container, then merges audio with host-side ffmpeg (Linux AAC encoder doesn't produce QuickTime-compatible output).
  • --audience beginner|intermediate|expert, --tone casual|formal|socratic, --theme chalkboard|light|colorful.
  • --preview for fast 480p15 renders (separate preview.mp4, doesn't clobber the full HD output).
  • Post-render visual QA: samples frames, sends them to Claude with the scene source, and re-invokes manim_agent up to 2 times if it flags issues.
  • Resume via --run-id; skip render if final.mp4 already exists.

Still in closed beta. If you'd like an invite, the waitlist is open; if you'd rather not wait, the open-source repo runs the same pipeline on your own machine.

Full commit history on GitHub ↗