Speech Synthesis, or Talk to me Tcl

It hurts me to advocate Windows but using tcom with the free speech engine from Microsoft is almost too easy (yes I know Microsoft and free speech sounds like an oxymoron :)

 package require tcom
 
 set voice [::tcom::ref createobject Sapi.SpVoice]
 
 $voice Speak "Hello World" 1
 
 after 3000
 exit

The speech SDK is available from http://www.microsoft.com/speech/download/sdk51/ (51MB), there is a beta of .Net Speech at http://www.microsoft.com/speech/ (200MB) but I haven't tried that yet. - VPT

MG Anyone happen to have a link for more info on this? I've been using it successfully for a couple of weeks in something, but someone just found an error -

  $voice Speak "<> test <" 1

returns "0x80045042 {Unknown error}". I did a search on the Microsoft website for the code which, naturally, came up blank. Any ideas would be appreciated :)

NEM On Mac OS X there is a command-line "say" program for accessing the built-in speech synthesis capabilities of the OS:

 exec /usr/bin/say "Hello World"

WHD Ubuntu Linux (and I suspect other distributions) ships with a similar command called "espeak":

  $ espeak "Hello World"

GPS Rsynth is a nice public domain package that I've used for speech synthesis. I wrote a say_this.tcl script that built a GUI for tweaking the voice. I've been thinking about improving Rsynth, because it seems to be at the moment dead. Another tool that I've heard good things about was Festival[L1 ]. CMU's Sphinx[L2 ] is another tool that may be good, but I haven't heard from users of it.

See Festtcl for a Tcl interface to Festival.

Tcl'ers may be interested in the CSLU Toolkit [L3 ] developed at the Oregon Graduate Institute's Center for Spoken Language Understanding. It includes a RAD environment which supports Tcl and provides tools to do Speech Recognition as well as speech synthesis. -- aricb

JKM would like to know if MG found any solution to his "0x80045042 {Unknown error}". I'm getting the same error, but it may be for a different reason. By default, the SAPI tries to interpret the string with XML tags if the first character is '<'. Change your 1 to a 17 and you should be alright. I'm getting the same error with

  $voice Speak "c:/code/tcl/SAPI2.txt" 5

any help would be appreciated.

MG never did, I'm afraid

MG Having looked on the Microsoft website - for once, it's actually returned something sensible, searching for "sapi.spvoice" on www.microsoft.com - it seems that '5' means "speak a file", the 17 you mentioned before "speak without parsing XML", and 1 is the default. I can replicated the "Unknown Error", using '5', when the file in question doesn't exist. To quote one page of the MS website, the argument must be "a null-terminated, fully qualified path to a file". The page in question is [L4 ], and seems (as of July 10 2005, before they change the address) to be about the first page to start looking at, for this method of speech.

ET: Hey, this is pretty cool. I also downloaded the documentation and found these flags, so the 5 would be 4+1 or async and filename:

Enum SpeechVoiceSpeakFlags
'SpVoice flags
SVSFDefault = 0
SVSFlagsAsync = 1
SVSFPurgeBeforeSpeak = 2
SVSFIsFilename = 4
SVSFIsXML = 8
SVSFIsNotXML = 16
SVSFPersistXML = 32

'Normalizer flags
SVSFNLPSpeakPunc = 64

End Enum

SVSFDefault

Specifies that the default settings should be used. The defaults are:
To speak the given text string synchronously (override with SVSFlagsAsync),
Not to purge pending speak requests (override with SVSFPurgeBeforeSpeak),
To parse the text as XML only if the first character is a left-angle-bracket (override with SVSFIsXML or SVSFIsNotXML),
Not to persist global XML state changes across speak calls (override with SVSFPersistXML), and
Not to expand punctuation characters into words (override with SVSFNLPSpeakPunc).

SVSFlagsAsync

Specifies that the Speak call should be asynchronous. That is, it will return immediately after the speak request is queued.

SVSFPurgeBeforeSpeak

Purges all pending speak requests prior to this speak call.

SVSFIsFilename

The string passed to the Speak method is a file name rather than text.
As a result, the string itself is not spoken but rather
the file the path that points to is spoken.

SVSFIsXML

The input text will be parsed for XML markup.

SVSFIsNotXML

The input text will not be parsed for XML markup.

SVSFPersistXML

Global state changes in the XML markup will persist across speak calls.

SVSFNLPSpeakPunc

Punctuation characters should be expanded into words (e.g. "This is it." would become "This is it period").

ETI found the following additional commands:

 $voice Rate       ;# return the rate (can be changed while reading a file async)
 $voice Rate value ;# set the rate of speech. Seems to take values like, -5,-4,...0...1...5,6,...

 $voice Skip Sentence N ;# N is plus or minus, this is while it's reading a file

 $voice Volume     ;# return volume
 $voice Volume N   ;# set volume to N

 $voice Pause      ;# these 2 work while reading a file, using option 5 (asyn, file)
 $voice Resume

There are others that seem to return a handle, but I can't figure out how to then use them. The one I wanted to play with was Voice, you can do this

 set v [$voice Voice]
 $v

which then gives you

usage: handle ?options? method ?arg ...?

But I couldn't figure out what to do next. Too bad it doesn't return a list of permitted options like many tcl commands do.

MG has made a little headway.

  set v [$voice Voice]
  puts "You are speaking with [$v GetDescription]"

Which, for me, shows

  You are speaking with Microsoft Sam

which is the default voice on my computer. You can also use

  set current [$voice Voice]
  set list [$voice GetVoices]
  set howmany [$list Count]
  for {set i 0} {$i < $howmany} {incr i} {
       set person [$list Item $i]
       puts "Voice $i is called [$person GetDescription], and is [set gender [$person GetAttribute Gender]]."
       puts "Let's make [expr {$gender == "Female" ? "her" : "him"}] speak."
       $voice Voice $person
       $voice Speak "Hello, my name is [$person GetAttribute Name]"
      }
  $voice Voice $current ;# return to original

to get a list of all the voices you have, and show their name/gender. ($person GetDescription is the same as $person GetAttribute Name).

As you can see, you can change the voice by using

  $voice Voice <newVoice>

where <newVoice> is a voice of the form [[$voice GetVoices] Item $number]

It's also possible to control which device the sound goes to, in much the same way as setting the voice:

  set current [$voice AudioOutput]
  set devices [$voice GetAudioOutputs]
  set howmany [$devices Count]
  for {set i 0} {$i < $howmany} {incr i} {
       set thisdevice [$devices Item $i]
       puts "Device $i is called '[$thisdevice GetDescription]'. Let's play something through it..."
       $voice AudioOutput $thisdevice
       $voice Speak "I am speaking through [$thisdevice GetDescription]"
      }
  $voice AudioOutput $current ;# return to default

Each async speak (of a file or a string) gets queued and will be spoken in turn. To wait until speaking is done, one can use this:

 $voice WaitUntilDone <timeout-milliseconds> ;# 0 returned if a timeout occurred and still speaking, otherwise a 1 if finished

If timeout milliseconds is -1 this will just wait until the voice is done speaking, but this would then hang the event loop until the speech is done. Using a small value (e.g. 10), one can do a polling operation.

TLT Here is a simple script that implements talking Caller ID using a modem:

  # This script opens a Caller ID-enabled modem and speaks the received name after every ring.

  package require tcom

  set Modem com8: ;# modem port
  set Name  ""    ;# current name to speak

  # This procedure responds to messages from the modem. 
  proc modemCallback {chan voice} {

      # Read the message from the modem.    
      set line [gets $chan]
      puts $line

      # If a ring, speak the current name and start a timer to reset the name.
      if {$line == "RING" && $::Name != ""} {
          $voice Speak $::Name
          set cmd {set ::Name ""}
          after cancel $cmd
          after 6000 $cmd
          return
      }

      # Get the name from the "NAME=" line.
      if {![regexp {NAME=(.*)} $line {} name]} {
          return
      }

      # Speak the name.
      set ::Name $name
      $voice Speak $::Name
  }

  # Create a button to show the console.
  pack [button .button -text "Show Console" -command "console show"]
  wm iconify .

  # Create the voice object.
  set voice [::tcom::ref createobject Sapi.SpVoice]

  # Open the modem.  The procedure "modemCallback" will be called when a message is received.
  set chan [open $::Modem w+]
  fconfigure $chan -buffering line
  fileevent $chan readable [list modemCallback $chan $voice]

  # Reset the modem and enable Caller ID.
  puts $chan "ATZ"
  puts $chan "AT+VCID=1"

peterc 2008-11-26: The sound sample quality on the default voice is pretty horrible; today's mobile phones are more clear. Does Microsoft (or any third party) offer better samples anywhere?

TLT This script saves the spoken text as a wave file:

  package require tcom
  
  # Create the objects.
  set voice [::tcom::ref createobject Sapi.SpVoice]
  set fileStream [::tcom::ref createobject Sapi.SpFileStream]
  set audioFormat [::tcom::ref createobject Sapi.SpAudioFormat]
  
  # Set the audio format of the file stream.
  set SAFT11kHz8BitMono 8
  $audioFormat Type $SAFT11kHz8BitMono
  $fileStream Format $audioFormat
  
  # Open the file and attach the stream to the voice.
  set SSFMCreateForWrite 3
  $fileStream Open text.wav $SSFMCreateForWrite False
  $voice AudioOutputStream $fileStream
  
  # Speak the text.
  $voice Speak "This is a wave file"
  $fileStream Close

WJG (06/04/12) Just hacked together a simple proc to read text strings aloud using the espeak Linux command. Its good fun. It also shows how useful the gnocl::setOpts command can be.

#---------------
# espeak.tcl
#---------------
## \file
# File documentation.
#\verbatim
#!/bin/sh
#\
exec tclsh "$0" "$@"
#\endverbatim

package require Gnocl

##
# -f   Text file to speak
# -a   Amplitude, 0 to 200, default is 100
# -g   Word gap. Pause between words, units of 10mS at the default speed
# -l   Line length. If not zero (which is the default), consider
#      lines less than this length as end-of-clause
# -p   Pitch adjustment, 0 to 99, default is 50
# -s   Speed in words per minute, 80 to 390, default is 170
# -v   Use voice file of this name from espeak-data/voices
# -w   Write output to this WAV file, rather than speaking it directly
# -b   Input text encoding, 1=UTF8, 2=8 bit, 4=16 bit
# -m   Interpret SSML markup, and ignore other < > tags
# -q   Quiet, don't produce any speech (may be useful with -x)
# -x   Write phoneme mnemonics to stdout
# -X   Write phonemes mnemonics and translation trace to stdout
# -z   No final sentence pause at the end of the text
# ref: espeak man pages
proc readAloud {args} {
    # set defaults and parse args
    gnocl::setOpts "-a 5 -s 110 $args"
    eval exec espeak -a $a -s $s [list $t]
}

readAloud -t "How now brown cow. She sells sea shells by the sea shore. Peter Piper picked a peck of pickled peppers."

MSA 06/05/12


#neospeech web service api example. 

package r http
package r dom
package r tcom

# Подключаемся по https
package require tls
http::register https 443 ::tls::socket



# voices
#TTS_PAUL_DB
#TTS_KATE_DB
#TTS_JULIE_DB 


namespace eval neospeech {

proc getConvertSimpleVoice {str} {
        set voice TTS_KATE_DB
        set email [email protected]
        set accountId 1234567890         
        set loginPassword somepassword
        set str [::http::formatQuery text $str]
        set url "https://tts.neospeech.com/rest_1_1.php?method=ConvertSimple&email=$email&accountId=$accountId&loginKey=LoginKey&loginPassword=$loginPassword&voice=$voice&outputFormat=FORMAT_WAV&sampleRate=16&text=$str"
        set token [::http::geturl $url]
        set resultCode [getXMLResponseAttr [::http::data $token] resultCode]
        if {$resultCode != 0} {
                puts Error:[getXMLResponseAttr [::http::data $token] resultString]
                exit
        }

        set conversionNumber [getXMLResponseAttr [::http::data $token] conversionNumber]
        set statusCode 5
        while {$statusCode != 4} {
                set url "https://tts.neospeech.com/rest_1_1.php?method=GetConversionStatus&email=$email&accountId=$accountId&conversionNumber=$conversionNumber"
                set token [::http::geturl $url]
                set statusCode [getXMLResponseAttr [::http::data $token] statusCode]
                if {$statusCode == 5} {
                        puts statusCode=5
                        exit
                }
        }

        set downloadUrl [getXMLResponseAttr [::http::data $token] downloadUrl]
        return $downloadUrl
}

proc getXMLResponseAttr {xml_response attr} {

        set d [::dom::DOMImplementation parse $xml_response]
        set response [set [::dom::document getElementsByTagName $d response]]
        set attr [::dom::element getAttribute $response $attr]
        return $attr
}


};# end ns neospeech

proc getFile { url } {
       set token [::http::geturl $url]
       set data [::http::data $token]
       ::http::cleanup $token          
       return $data
}

set str {Let him now speak, or else hereafter for ever hold his peace.}

set voice_url [neospeech::getConvertSimpleVoice $str]

set file [getFile $voice_url]
set fname [lindex [split $voice_url /] end]
set f [open $fname w]
fconfigure $f -translation binary
puts $f $file
close $f


set App [::tcom::ref createobject wmplayer.ocx]
#$App URL $voice_url
$App URL $fname
[$App controls] play

chw 2021-02-21 inspired by Tristan's tclespeak I wrote a small ffidl/TclOO wrapper which can be found on http://www.androwish.org/home/dir?name=undroid/espeak0.1

Category Speech Synthesis