Version 29 of Speech Synthesis, or Talk to me Tcl

Updated 2009-12-11 19:55:21 by jdc

It hurts me to advocate Windows but using tcom with the free speech engine from Microsoft is almost too easy (yes I know Microsoft and free speech sounds like an oxymoron :)

 package require tcom

 set voice [::tcom::ref createobject Sapi.SpVoice]

 $voice Speak "Hello World" 1

 after 3000
 exit

The speech SDK is available from http://www.microsoft.com/speech/download/sdk51/ (51MB), there is a beta of .Net Speech at http://www.microsoft.com/speech/ (200MB) but I haven't tried that yet. - VPT

MG Anyone happen to have a link for more info on this? I've been using it successfully for a couple of weeks in something, but someone just found an error -

  $voice Speak "<> test <" 1

returns "0x80045042 {Unknown error}". I did a search on the Microsoft website for the code which, naturally, came up blank. Any ideas would be appreciated :)

NEM On Mac OS X there is a command-line "say" program for accessing the built-in speech synthesis capabilities of the OS:

 exec /usr/bin/say "Hello World"

GPS Rsynth is a nice public domain package that I've used for speech synthesis. I wrote a say_this.tcl script that built a GUI for tweaking the voice. I've been thinking about improving Rsynth, because it seems to be at the moment dead. Another tool that I've heard good things about was Festival[L1 ]. CMU's Sphinx[L2 ] is another tool that may be good, but I haven't heard from users of it.

See Festtcl for a Tcl interface to Festival.


Tcl'ers may be interested in the CSLU Toolkit [L3 ] developed at the Oregon Graduate Institute's Center for Spoken Language Understanding. It includes a RAD environment which supports Tcl and provides tools to do Speech Recognition as well as speech synthesis. -- aricb


JKM would like to know if MG found any solution to his "0x80045042 {Unknown error}". I'm getting the same error, but it may be for a different reason. By default, the SAPI tries to interpret the string with XML tags if the first character is '<'. Change your 1 to a 17 and you should be alright. I'm getting the same error with

  $voice Speak "c:/code/tcl/SAPI2.txt" 5

any help would be appreciated.

MG never did, I'm afraid

MG Having looked on the Microsoft website - for once, it's actually returned something sensible, searching for "sapi.spvoice" on www.microsoft.com - it seems that '5' means "speak a file", the 17 you mentioned before "speak without parsing XML", and 1 is the default. I can replicated the "Unknown Error", using '5', when the file in question doesn't exist. To quote one page of the MS website, the argument must be "a null-terminated, fully qualified path to a file". The page in question is [L4 ], and seems (as of July 10 2005, before they change the address) to be about the first page to start looking at, for this method of speech.


ET: Hey, this is pretty cool. I also downloaded the documentation and found these flags, so the 5 would be 4+1 or async and filename:

 Enum SpeechVoiceSpeakFlags
    'SpVoice flags
    SVSFDefault = 0
    SVSFlagsAsync = 1
    SVSFPurgeBeforeSpeak = 2
    SVSFIsFilename = 4 
    SVSFIsXML = 8
    SVSFIsNotXML = 16
    SVSFPersistXML = 32

    'Normalizer flags
    SVSFNLPSpeakPunc = 64

  End Enum

SVSFDefault

        Specifies that the default settings should be used. The defaults are: 
        To speak the given text string synchronously (override with SVSFlagsAsync), 
        Not to purge pending speak requests (override with SVSFPurgeBeforeSpeak), 
        To parse the text as XML only if the first character is a left-angle-bracket (override with SVSFIsXML or SVSFIsNotXML), 
        Not to persist global XML state changes across speak calls (override with SVSFPersistXML), and 
        Not to expand punctuation characters into words (override with SVSFNLPSpeakPunc). 

SVSFlagsAsync

        Specifies that the Speak call should be asynchronous. That is, it will return immediately after the speak request is queued. 

SVSFPurgeBeforeSpeak

        Purges all pending speak requests prior to this speak call. 

SVSFIsFilename

        The string passed to the Speak method is a file name rather than text. 
        As a result, the string itself is not spoken but rather         
        the file the path that points to is spoken. 

SVSFIsXML

        The input text will be parsed for XML markup. 

SVSFIsNotXML

        The input text will not be parsed for XML markup. 

SVSFPersistXML

        Global state changes in the XML markup will persist across speak calls. 

SVSFNLPSpeakPunc

        Punctuation characters should be expanded into words (e.g. "This is it." would become "This is it period"). 

ETI found the following additional commands:

 $voice Rate       ;# return the rate (can be changed while reading a file async)
 $voice Rate value ;# set the rate of speech. Seems to take values like, -5,-4,...0...1...5,6,...

 $voice Skip Sentence N ;# N is plus or minus, this is while it's reading a file

 $voice Volume     ;# return volume
 $voice Volume N   ;# set volume to N

 $voice Pause      ;# these 2 work while reading a file, using option 5 (asyn, file)
 $voice Resume

There are others that seem to return a handle, but I can't figure out how to then use them. The one I wanted to play with was Voice, you can do this

 set v [$voice Voice]
 $v

which then gives you

usage: handle ?options? method ?arg ...?

But I couldn't figure out what to do next. Too bad it doesn't return a list of permitted options like many tcl commands do.

MG has made a little headway.

  set v [$voice Voice]
  puts "You are speaking with [$v GetDescription]"

Which, for me, shows

  You are speaking with Microsoft Sam

which is the default voice on my computer. You can also use

  set current [$voice Voice]
  set list [$voice GetVoices]
  set howmany [$list Count]
  for {set i 0} {$i < $howmany} {incr i} {
       set person [$list Item $i]
       puts "Voice $i is called [$person GetDescription], and is [set gender [$person GetAttribute Gender]]."
       puts "Let's make [expr {$gender == "Female" ? "her" : "him"}] speak."
       $voice Voice $person
       $voice Speak "Hello, my name is [$person GetAttribute Name]"
      }
  $voice Voice $current ;# return to original

to get a list of all the voices you have, and show their name/gender. ($person GetDescription is the same as $person GetAttribute Name).

As you can see, you can change the voice by using

  $voice Voice <newVoice>

where <newVoice> is a voice of the form [[$voice GetVoices] Item $number]

It's also possible to control which device the sound goes to, in much the same way as setting the voice:

  set current [$voice AudioOutput]
  set devices [$voice GetAudioOutputs]
  set howmany [$devices Count]
  for {set i 0} {$i < $howmany} {incr i} {
       set thisdevice [$devices Item $i]
       puts "Device $i is called '[$thisdevice GetDescription]'. Let's play something through it..."
       $voice AudioOutput $thisdevice
       $voice Speak "I am speaking through [$thisdevice GetDescription]"
      }
  $voice AudioOutput $current ;# return to default

Each async speak (of a file or a string) gets queued and will be spoken in turn. To wait until speaking is done, one can use this:

 $voice WaitUntilDone <timeout-milliseconds> ;# 0 returned if a timeout occurred and still speaking, otherwise a 1 if finished

If timeout milliseconds is -1 this will just wait until the voice is done speaking, but this would then hang the event loop until the speech is done. Using a small value (e.g. 10), one can do a polling operation.


TLT Here is a simple script that implements talking Caller ID using a modem:

  # This script opens a Caller ID-enabled modem and speaks the received name after every ring.

  package require tcom

  set Modem com8: ;# modem port
  set Name  ""    ;# current name to speak

  # This procedure responds to messages from the modem. 
  proc modemCallback {chan voice} {

      # Read the message from the modem.    
      set line [gets $chan]
      puts $line

      # If a ring, speak the current name and start a timer to reset the name.
      if {$line == "RING" && $::Name != ""} {
          $voice Speak $::Name
          set cmd {set ::Name ""}
          after cancel $cmd
          after 6000 $cmd
          return
      }

      # Get the name from the "NAME=" line.
      if {![regexp {NAME=(.*)} $line {} name]} {
          return
      }

      # Speak the name.
      set ::Name $name
      $voice Speak $::Name
  }

  # Create a button to show the console.
  pack [button .button -text "Show Console" -command "console show"]
  wm iconify .

  # Create the voice object.
  set voice [::tcom::ref createobject Sapi.SpVoice]

  # Open the modem.  The procedure "modemCallback" will be called when a message is received.
  set chan [open $::Modem w+]
  fconfigure $chan -buffering line
  fileevent $chan readable [list modemCallback $chan $voice]

  # Reset the modem and enable Caller ID.
  puts $chan "ATZ"
  puts $chan "AT+VCID=1"

peterc 2008-11-26: The sound sample quality on the default voice is pretty horrible; today's mobile phones are more clear. Does Microsoft (or any third party) offer better samples anywhere?


TLT This script saves the spoken text as a wave file:

  package require tcom

  # Create the objects.
  set voice [::tcom::ref createobject Sapi.SpVoice]
  set fileStream [::tcom::ref createobject Sapi.SpFileStream]
  set audioFormat [::tcom::ref createobject Sapi.SpAudioFormat]

  # Set the audio format of the file stream.
  set SAFT11kHz8BitMono 8
  $audioFormat Type $SAFT11kHz8BitMono
  $fileStream Format $audioFormat

  # Open the file and attach the stream to the voice.
  set SSFMCreateForWrite 3
  $fileStream Open text.wav $SSFMCreateForWrite False
  $voice AudioOutputStream $fileStream

  # Speak the text.
  $voice Speak "This is a wave file"
  $fileStream Close