3 May 2004
Features of DEMOSTHeNES version 2
DEMOSTHeNES is a multi-voice text-to-speech system. It offers a selection of different voices (males and females), as well as "character" definition based on these voices. Each characters define its own TtS process (e.g. letter-to-sound conversion, prosody model), allowing the personalization of the produced speech.
DEMOSTHeNES novel architecture allows efficient implementations to be developed. In server mode operation, DEMOSTHeNES is able to serve each session at more than 200*realtime, offering many channels in telecom applications.
The Text Analyzer is based on finite state automata (FSA) engine and is able to identify:
More than 800 acronyms in all declensions, with configurable pronunciation: for example 'το Ι.Κ.Α.' or το 'ΙΚΑ' can be pronounced either as 'το Ίδρυμα Κοινωνικών Ασφαλίσεων' or as 'το Ίκα'.
Several forms of dates and times. E.g '21/2/2001' -> 'Εικοσιμία δευτέρου του δύο χιλιάδες ένα' and '18:45' -> 'Δεκαοκτώ και σαράνταπέντε'
Numerics, Latin numbers and Greek numbers
Abbreviations (e.g. 'κλπ' or 'κ.λ.π.' -> 'και λοιπά')
Other marks (e.g. '(' -> 'παρένθεση' and ')' -> 'κλείνει παρένθεση')
In DEMOSTHeNES, text is being analyzed in order to extract grammatical and syntactical information. Such information is being exploited during prosody generation for more realistic tonal balance of words of different part of speech.
The Pronunciation Generator deals with the coarticulation effects of each language in order to best convey the pronunciation of words. Currently, it supports Greek and English.
DEMOSTHeNES is a polyglot system that means it can handle text with more than one languages at the same time (e.g. a Greek document that contains an English paragraph). It currently supports the Greek and the English language (currently both with Greek pronunciation).
DEMOSTHeNES introduces features for increasing the naturalness and reducing the predictability in the produced speech. In version 2, prosody generation is based on machine learning approaches from large speech corpora.
DEMOSTHeNES comes bundled with 3 new di-cluster based natural Greek voices (2 males and 1 female) and the MBROLA synthesizer. This voices consist of 1081 di-clusters that captures most of the co-articulation events in Greek, and has been carefully recorded in order for each cluster to accentuate its acoustic features.
DEMOSTHeNES’s component architecture allows it to be expanded in several ways. New languages, new hues and voices, signal processing modules, language processing modules and much more can be ported to this platform.
The modules of DEMOSTHeNES are fully customized and furthermore, they can form independent applications (e.g. the Greek-to-IPA converter to be used in dictionary applications).
The operation of DEMOSTHeNES can be configured per module (the scale of configuration depends on the version). For example, the end user can select whether acronyms will be expanded on not. Furthermore, DEMOSTHeNES can be bundled with other synthesizers, as the formant based synthesizer module VMOD_FORMANT, which currently delivers a lower quality voice than the MBROLA does.