Loading AI tools
Machine-readable pronunciations From Wikipedia, the free encyclopedia
The CMU Pronouncing Dictionary (also known as CMUdict) is an open-source pronouncing dictionary originally created by the Speech Group at Carnegie Mellon University (CMU) for use in speech recognition research.
Developer(s) | Carnegie Mellon University |
---|---|
Stable release | 0.7b
/ November 19, 2014 |
Available in | English |
License | BSD |
Website | www |
CMUdict provides a mapping orthographic/phonetic for English words in their North American pronunciations. It is commonly used to generate representations for speech recognition (ASR), e.g. the CMU Sphinx system, and speech synthesis (TTS), e.g. the Festival system. CMUdict can be used as a training corpus for building statistical grapheme-to-phoneme (g2p) models[1] that will generate pronunciations for words not yet included in the dictionary.
The most recent release is 0.7b; it contains over 134,000 entries. An interactive lookup version is available.[2]
The database is distributed as a plain text file with one entry to a line in the format "WORD <pronunciation>
" with a two-space separator between the parts. If multiple pronunciations are available for a word, variants are identified using numbered versions (e.g. WORD(1)
). The pronunciation is encoded using a modified form of the ARPABET system, with the addition of stress marks on vowels of levels 0, 1, and 2. A line-initial ;;;
token indicates a comment. A derived format, directly suitable for speech recognition engines is also available as part of the distribution; this format collapses stress distinctions (typically not used in ASR).
The following is a table of phonemes used by CMU Pronouncing Dictionary.[2]
AB | Description |
---|---|
0 | No stress |
1 | Primary stress |
2 | Secondary stress |
Version | Release date[3] | License |
---|---|---|
0.1 | 16 September 1993 | Public Domain |
0.2 | 10 March 1994 | Public Domain |
0.3 | 28 September 1994 | Public Domain |
0.4 | 8 November 1995 | Public Domain |
0.5 | No public release | Public Domain |
0.6 | 11 August 1998 | Public Domain |
0.7 | No public release | Public Domain |
0.7a | 18 February 2008 | 2-clause BSD |
0.7b | 19 November 2014[4] | 2-clause BSD |
GitHub (unversioned) | 26 May 2021 | 2-clause BSD |
Seamless Wikipedia browsing. On steroids.
Every time you click a link to Wikipedia, Wiktionary or Wikiquote in your browser's search results, it will show the modern Wikiwand interface.
Wikiwand extension is a five stars, simple, with minimum permission required to keep your browsing private, safe and transparent.