Willow on Cloudflare — the developer platform underneath Voice AI's fastest scale-up

Willow ships the voice product. Cloudflare runs the global edge underneath.

You've shipped the hard part: a dictation engine fast enough and accurate enough that 50K+ users prefer it to the keyboard. The next infrastructure surface is the part players don't see — the inference plane behind Scribe, the binary + model distribution to every Mac and Windows desktop, the Teams tenancy boundary, the SOC 2 audit surface for the next enterprise security review. Cloudflare's developer platform is built for exactly that shape, and you're already on it for DNS + the API plane.

Willow builds

The voice product: dictation engine, Scribe, desktop clients

Mac, Windows, and iPhone clients. The ASR + post-processing engine that handles whisper, accents, mid-sentence edits, and context-aware spelling. Style-matching across apps. Scribe as the new Voice AI writing assistant. Voice commands. SOC 2 + HIPAA + zero data retention compliance.

Best-in-class ASR + post-processing (whisper-grade audio handling)
Native Mac, Windows, iPhone clients with hotkey UX
Scribe — voice-driven AI writing assistant on top of dictation
Style-matching + context-aware spelling per app

Cloudflare runs

The global edge: API, inference, binary delivery, tenancy

The same edge that's already serving api.willowvoice.com. AI Gateway in front of Scribe's inference layer. R2 for model weights + desktop binary distribution at zero egress. Workers for Platforms for the Teams plan tenancy. Zero Trust for the founder-team's access to production.

AI Gateway for Scribe inference — cache, attribution, BYO keys
R2 + Workers for Mac/Windows binary delivery, zero egress
Workers for Platforms for the Teams plan — per-org tenancy
Workers AI for the audio-side ASR + post-processing where it fits

Nine primitives, mapped to Willow's actual product surface.

Each maps to something you ship today (the API plane, the desktop clients, Scribe, the audio pipeline, the HIPAA-grade compliance surface) or something on the roadmap (Teams plan, organization-level controls, enterprise security reviews). Status tags show what's already live in your Cloudflare footprint.

PRIMITIVE 01 Live on CF

DNS for willowvoice.com

Authoritative DNS via christian / dorthy.ns.cloudflare.com. The foundation everything else snaps onto. Procurement is in place, SOC 2 mapping exists, support relationship is established — expansion is a configuration change, not a vendor selection.

DNS Foundation

PRIMITIVE 02 Live on CF

API plane on Cloudflare anycast

api.willowvoice.com resolves to Cloudflare anycast (104.26.x / 172.67.x) — meaning it's either a Worker, a CF proxy, or running through a CF Tunnel. Either way, the desktop and iOS clients talk to Cloudflare on every request already.

Workers Anycast API

PRIMITIVE 03 Highest-leverage next

AI Gateway in front of Scribe

Scribe = "polished message from a few quick words" = LLM call per invocation, multiplied across 50K+ users. AI Gateway sits in front: semantic cache for repeated patterns ("draft a Slack follow-up to X"), per-user attribution, budget caps before runaway spend, BYO keys for enterprise customers when the Teams plan ships.

AI Gateway Semantic cache Scribe

PRIMITIVE 04 Distribution

R2 for Mac + Windows + iOS binary delivery

Every desktop install pulls a binary. Every release updates it. With 50K+ users across three platforms and growing fast, that's a meaningful CDN bill at any reasonable scale. R2's zero egress + Workers' Smart Placement serves from the closest POP to each user without per-region S3 sprawl.

R2 Zero egress Smart Placement

PRIMITIVE 05 Teams plan

Workers for Platforms = per-org tenancy

The Teams page exists today (in the footer nav). When the Teams plan ships, every enterprise customer wants their own custom dictionary, their own SSO, their own data residency, their own audit log, their own Scribe budget. Workers for Platforms makes those boundaries infrastructure, not config flags.

Workers for Platforms Per-org Teams

PRIMITIVE 06 Custom dictionary

Vectorize for context-aware spelling

"Willow spells unique terms and names correctly using contextual cues." That's a retrieval problem. Vectorize indexes per-user (and per-org, when Teams ships) custom dictionaries + recent context so the right spelling comes back in single-digit milliseconds.

Vectorize Custom dict Context

PRIMITIVE 07 Streaming

Durable Objects for live transcription state

Live dictation sessions have state: the audio buffer, the running transcript, the user's style profile, the in-flight edit operations. Durable Objects give you a single-writer state holder per active session, at the edge, with native WebSocket support — no Redis cluster needed.

Durable Objects WebSockets Sessions

PRIMITIVE 08 Compliance

Zero Trust for founder-team production access

SOC 2 + HIPAA require audit-grade access controls to production. As Willow grows from YC X25 stage into enterprise sales, the security reviews get harder. Cloudflare Access closes the loop — identity-aware, audit-logged access to admin consoles and model environments without standing up a separate IdP stack.

Access Tunnel SOC 2

PRIMITIVE 09 Bot protection

Bot Management + Turnstile on signups

50K+ users with a free tier and a $15/mo paid tier = a magnet for automated abuse: fake signups, free-tier exploitation, scraped binaries. Bot Management at the edge stops the abuse before it touches the auth backend; Turnstile drops in cleanly on signup and download flows.

Bot Management Turnstile WAF

A Scribe request is a pipeline waiting to be cached.

"Just say a few quick words. Willow turns it into a fully polished message." Every Scribe invocation is an LLM call. Across 50K+ users typing emails and Slack messages and Cursor prompts, those requests cluster brutally — same draft-an-apology, same write-a-follow-up, same rewrite-this-more-professional patterns. The cache hit rate is structural, and AI Gateway captures it from request one.

Scribe request flow, sketched on Cloudflare primitives

From "user hits hotkey and speaks a rough sentence" to a polished, formatted, on-brand response in their target app — cached, attributed, audited.

DESKTOP CLIENT

Mac / Windows / iOS hotkey

audio + app context captured

→

EDGE ROUTING

Workers on api.willowvoice.com

already live — closest POP to user

→

CACHE + ROUTE

AI Gateway + Vectorize

semantic cache, per-user style retrieval

→

RESPONSE

Streamed back through Workers

attribution logged, ZDR maintained

What this changes: The same "rewrite this more professionally" prompt gets called thousands of times an hour across all 50K+ users. With semantic cache in AI Gateway, identical-intent prompts return in milliseconds without touching the model layer at all. That's the difference between a Scribe inference bill that scales linearly with users, and one that grows sublinearly with unique requests.

The economics of Scribe at 50K→500K users.

Voice dictation is one of the most edge-amenable workloads in software: small audio packets in, structured text out, latency-sensitive at every step. Add Scribe's LLM layer on top, and you're now also paying for inference per invocation. AI Gateway turns both into one observable, attributable cost line — before the Series A inference bill becomes a board-meeting agenda item.

A back-of-the-envelope, not a quote

Modeled across Scribe inference + API serving + binary delivery at the current scale (50K+ users, growing fast)

SEMANTIC CACHE HIT RATE

40–60%

Productivity-prompt queries cluster heavily: "draft a follow-up," "rewrite professionally," "summarize this." Higher hit rate than general LLM workloads.

BINARY EGRESS SAVINGS

40–60%

R2's zero egress vs. AWS S3 + CloudFront pricing across Mac / Windows / iOS installer + update delivery as the user base scales.

PER-USER ATTRIBUTION

100%

AI Gateway gives per-user, per-feature (dictation vs Scribe), per-platform attribution — the data needed to defensibly tier free vs. $15/mo vs. Teams.

The real win for a YC X25 company is unit-economic clarity. Today Scribe's per-user inference cost is a guess. AI Gateway makes it a line item: this user costs $0.X/month in inference, that user costs $X.X. That's the data a Series A investor wants to see, and it's the data Willow needs to price Teams at the right margin instead of cost-plus.

Three platforms, three tenants. Workers for Platforms is the boundary.

Mac, Windows, and iPhone have different update cadences, different binary sizes, different telemetry shapes, different App Store / Notarization / Microsoft Store distribution flows. Each platform's binary delivery, telemetry ingest, and crash reporting can live in its own Worker namespace inside Workers for Platforms — same edge, isolated state.

Per-platform delivery + per-org Teams tenancy, sketched

Each desktop platform gets its own namespace for binary delivery + telemetry. Each Teams plan customer (when the plan ships) gets their own isolated tenant on the same edge.

🌿

macOS

Apple Silicon + Intel, Notarized

🖥️

Windows

x64, signed MSI, MS Store

📱

iOS

App Store, TestFlight beta

↓

Shared control plane — Workers for Platforms + R2 + AI Gateway

one runtime · one observability surface · platform × Teams customer = N×M isolated tenants by construction

Current stack, with Cloudflare overlaid.

Every row is sourced from public DNS records and HTTP response headers on willowvoice.com, app.willowvoice.com, and api.willowvoice.com. The purple rows are already running on Cloudflare today. The orange column is the expansion footprint.

What's running today, and where Cloudflare slots in

Purple rows = already on Cloudflare. Orange column = the expansion path. No Framer or Vercel rip-and-replace required.

LAYER

WILLOW RUNS TODAY

CLOUDFLARE FIT

DNS

Cloudflare (christian + dorthy.ns.cloudflare.com)

✅ Live — the foundation everything else snaps onto

API PLANE

Cloudflare anycast (104.26.x / 172.67.x on api.willowvoice.com)

✅ Live — already a Worker or proxy on the edge

MARKETING SITE

Framer (server: Framer/8c09469, us-west-2)

No change — Framer fronts cleanly behind the existing CF zone

APP DASHBOARD

Vercel on app.willowvoice.com (vercel-dns-016)

+ Cloudflare in front: edge cache, WAF, Bot Mgmt, Turnstile

SCRIBE AI INFERENCE

Just launched — likely OpenAI / Anthropic from the API plane

+ AI Gateway: cache, attribution, rate-limit, budget cap from day one

BINARY CDN

Likely S3 + CloudFront for Mac / Windows / iOS installers

+ R2 + Workers + Smart Placement — zero egress at scale

CUSTOM DICTIONARY

Per-user storage of custom terms + names

+ Vectorize + D1 for context-aware retrieval at the edge

TEAMS TENANCY

Teams page exists; plan likely roadmap-imminent

+ Workers for Platforms — per-org isolation by construction

LIVE SESSION STATE

In-flight transcription state per active session

+ Durable Objects + WebSockets — single-writer state per session

Google Workspace (MX = aspmx.l.google.com)

+ Cloudflare Email Security as defense-in-depth (optional)

SIGNUP / DOWNLOAD ABUSE

Free tier + paid tier = scrape-attractive surface

+ Bot Management + Turnstile + WAF on signup, download, redemption

FOUNDER-TEAM ACCESS

YC-stage startup — likely VPN or no-VPN to admin consoles

+ Cloudflare Access — SOC 2 / HIPAA audit-grade access boundary

Why this is the right week to start the conversation

Scribe just shipped. The banner is live on willowvoice.com today. Every architectural decision being made this quarter — what governs Scribe's inference, how attribution works as Teams ramps, what the unit economics look like at 500K users — will define the inference cost curve for the rest of 2026 and 2027. AI Gateway is the cheapest hour you can spend in front of that curve.

You're already on Cloudflare. DNS via christian + dorthy. The API plane via anycast. No procurement event to start, no security review to begin, no MSA to negotiate. Expanding the footprint from DNS + API to AI Gateway + R2 + Workers for Platforms is the most natural roadmap conversation in the lineup.

YC X25 timing. The Series A architecture decisions get locked in over the next 12–18 months. Picking the runtime now — while the team is small enough to move fast and the user base is small enough that re-platforming is cheap — is materially cheaper than doing it after Teams customers depend on it.

You're shipping 5× faster than the keyboard.
The runtime should be as edge-native as the dictation.

What's already running on Cloudflare today