noodls browser compatibility check

The security settings of your browser are blocking the execution of scripts.

To use noodls, javascript support must be enabled. Please change your browser's security settings to enable javascript.

If you have changed your browser's security settings, you can click here.

related announcements

News

DOE - Maine Department of Education

DOE’s What Holds Us Together Podcast Highlights Adult Education
U.S. Department of Justice

Chairman of Multinational Investment Company and Company Consultant[...]
Kim Schrier

Congresswoman Schrier Secures Program for Veterans in FAA Reauthorization

Information Technology

Oracle Corporation

03/11/2024 | Press release | Distributed by Public on 03/11/2024 10:32

OCI Speech supports the Whisper model

The Oracle Cloud Infrastructure (OCI) Speech service now supports the Whisper model from OpenAI. Trained on a large corpus of multilingual data, Whisper is a speech-to-text model that supports file-based transcription for over 50 languages. It uses the same service end points and API and software developer kit (SDK) interfaces as the OCI Speech model to give you the most flexibility and compatibility. The Whisper model also gained speaker diarization, a feature that distinguishes and labels different voices within an audio stream, allowing for precise speaker separation in the transcription.

The Whisper model has five sizes: tiny, base, small, medium, and large-V2. For the best cost-performance trade off, the medium Whisper model is made available in all OC1 regions from both The Oracle Cloud Console and SDK.

Figure 1: Create an OCI Speech job using the Whisper model

The large-V2 model is supported when submitting a service request in the Ashburn and Phoenix regions. We plan to make more regions and models available in the future, based on customer feedback.

Key features and benefits

The Whisper model in OCI Speech offers the following features and benefits:

Multilingual support: Broaden your audience reach with Whisper's multilingual support voice-to-text transcription for over 50 languages, including Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Māori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.
Diarization for speaker labeling: Introducing diarization capabilities for speaker labelling in audio recordings. The diarization feature enables distinct identification of multiple speakers. You can either specify the number of speakers (2-16) when submitting the transcription job or let OCI Speech automatically detect the number of speakers.
Same API and SDK interface as the native OCI Speech model: You use the same API and SDK interface when using the Whisper model as the native OCI Speech model. This integration ensures a smooth transition between models within OCI Speech. See the following table for a comparison of the native OCI Speech model and the Whisper model.

Feature	OCI Speech model	The Whisper model in OCI Speech
Real time transcriptions	Supported	Not supported
Large file size	Up to 2GB	Up to 2GB
Word level timestamp	Supported	Supported
File format	AAC, AC3, AMR, AU, FLAC, M4A, MKV, MP3, MP4, OGA, OGG, WAV, WEBM	AAC, AC3, AMR, AU, FLAC, M4A, MKV, MP3, MP4, OGA, OGG, WAV, WEBM
Multilingual support	EN, ES, FR, DE, PT, HI, IT	Same as Oracle ASR model plus 50 other languages
Diarization	Supported	Supported
English translation	Not supported	Coming soon

Table 1: Compare native OCI Speech model and the Whisper model in OCI Speech

Want to know more?

The OCI Speech service team is committed to empowering you with tools that redefine possibilities, and we look forward to you benefitting from the newly introduced Whisper model multilingual support with diarization capabilities. Contact your Oracle representative to discuss how OCI Speech with diarization can help you unlock the value of your multimedia data and gain the insight you need to bring your business to the next level.

If you're new to Oracle Cloud Infrastructure, try Oracle Cloud Free Trial, a free 30-day trial with US$300 in credits.

For more information, see the following resources:

Sharing and Personal Tools

Please select the service you want to use:

Back

View original format