Whisper is an automatic speech recognition (ASR) system developed by OpenAI. Trained on a diverse dataset of 680,000 hours of multilingual and multitask supervised data, Whisper achieves human-level robustness and accuracy in English speech recognition. The system is capable of transcribing audio into text and translating multiple languages into English.
The model’s architecture is based on an encoder-decoder Transformer, processing input audio into log-Mel spectrograms before generating the corresponding text. Whisper’s open-source nature allows developers to integrate its capabilities into various applications, enhancing accessibility and enabling new functionalities in speech processing.