Whisper WebGPU

Real-time in-browser speech recognition


You are about to load whisper-base, a 73 million parameter speech recognition model that is optimized for inference on the web. Once downloaded, the model (~200 MB) will be cached and reused when you revisit the page.

Everything runs directly in your browser using 🤗 Transformers.js and ONNX Runtime Web, meaning no data is sent to a server. You can even disconnect from the internet after the model has loaded!