DEMOSTHeNES Speech Composer

DEMOSTHeNES
Speech Composer
Information
Features
The system
Samples
Downloads
Mailing list
Publications
The rhetor
Contact

Gerasimos Xydas
gxydas@di.uoa.gr
Last updated:
3 May 2004

Features of DEMOSTHeNES version 2

Multi Characters

DEMOSTHeNES is a multi-voice text-to-speech system. It offers a selection of different voices (males and females), as well as "character" definition based on these voices. Each characters define its own TtS process (e.g. letter-to-sound conversion, prosody model), allowing the personalization of the produced speech.

Performance

DEMOSTHeNES novel architecture allows efficient implementations to be developed. In server mode operation, DEMOSTHeNES is able to serve each session at more than 200*realtime, offering many channels in telecom applications.

Text Analysis

The Text Analyzer is based on finite state automata (FSA) engine and is able to identify:

More than 800 acronyms in all declensions, with configurable pronunciation: for example 'το Ι.Κ.Α.' or το 'ΙΚΑ' can be pronounced either as 'το Ίδρυμα Κοινωνικών Ασφαλίσεων' or as 'το Ίκα'.

Several forms of dates and times. E.g '21/2/2001' -> 'Εικοσιμία δευτέρου του δύο χιλιάδες ένα' and '18:45' -> 'Δεκαοκτώ και σαράνταπέντε'

Numerics, Latin numbers and Greek numbers

Abbreviations (e.g. 'κλπ' or 'κ.λ.π.' -> 'και λοιπά')

Other marks (e.g. '(' -> 'παρένθεση' and ')' -> 'κλείνει παρένθεση')

Natural Language Processing

In DEMOSTHeNES, text is being analyzed in order to extract grammatical and syntactical information. Such information is being exploited during prosody generation for more realistic tonal balance of words of different part of speech.

Pronunciation Generator

The Pronunciation Generator deals with the coarticulation effects of each language in order to best convey the pronunciation of words. Currently, it supports Greek and English.

Polyglot

DEMOSTHeNES is a polyglot system that means it can handle text with more than one languages at the same time (e.g. a Greek document that contains an English paragraph). It currently supports the Greek and the English language (currently both with Greek pronunciation).

Prosody Generator

DEMOSTHeNES introduces features for increasing the naturalness and reducing the predictability in the produced speech. In version 2, prosody generation is based on machine learning approaches from large speech corpora.

Voices

DEMOSTHeNES comes bundled with 3 new di-cluster based natural Greek voices (2 males and 1 female) and the MBROLA synthesizer. This voices consist of 1081 di-clusters that captures most of the co-articulation events in Greek, and has been carefully recorded in order for each cluster to accentuate its acoustic features.

Expandability

DEMOSTHeNESs component architecture allows it to be expanded in several ways. New languages, new hues and voices, signal processing modules, language processing modules and much more can be ported to this platform.

Customization

The modules of DEMOSTHeNES are fully customized and furthermore, they can form independent applications (e.g. the Greek-to-IPA converter to be used in dictionary applications).

Other features

The operation of DEMOSTHeNES can be configured per module (the scale of configuration depends on the version). For example, the end user can select whether acronyms will be expanded on not. Furthermore, DEMOSTHeNES can be bundled with other synthesizers, as the formant based synthesizer module VMOD_FORMANT, which currently delivers a lower quality voice than the MBROLA does.