Version 5 of Syllable Counting

Updated 2016-02-03 01:25:57 by kpv

Summary

WJG (02/02/16) Obtaining the number of syllables in an English word is quite tricky because spellings can be irregular. For many languages a simple a vowel count would be sufficient but even this will throw up some inaccuracies in English. The following procedure shows a relatively easy approach to the problem.

* Remove initial 'y' (y is a semi-vowel and here acts as a consonant).

* Count the number of vowels (including y as a vowel).

* Reduce the count by the number of dipthongs.

* Reduce the count by silent vowel endings or modifying 'e's.

* If the total is less than 1, must be 1. (Aspirated, eg. psst!)

Code

#---------------
# syllables.tcl
#---------------
#!/bin/sh
#\
exec tclsh "$0" "[email protected]"

#---------------
# Obtain number of syllables in an English word
#---------------
# Arguments:
#        str        word
# Returns:
#        number of syllables
#
proc syllables { str } {
        
        set res 0
        
        # functions as a semi-vowel, i.e. as a consonant.
        set str [string trimleft $str y]
                
        # count total number of vowels
        foreach item {a e i o u y} {
                incr res [llength [regexp -all -inline (?=$item) $str]]
        } 
        
        # discount dipthongs, includes reversals
        foreach item {ai ie ei io ee ou oo oi ea ue ui} {
                incr res -[llength [regexp -all -inline (?=$item) $str]]
        } 

        # discount irregular word endings, typically containing e
        foreach item {ce nge me te ne ve re ye ue ze se eye} {
                incr res -[llength [regexp -all -inline (?=$item) $str]]
        } 

        # any word, even if it has no vowels will have at least 1 syllable, eg. psst!, shhh!
        if { $res < 1 } {
                set res 1
        } 

        return $res
}

        set words "
                colour allure yatch yahoo 
                yeti jeeze employees footy 
                early yearly psst phut 
                eye lye lie hectic 
                pneumatic aromatic automatic clinique"

        puts "syl.\tword\n[string repeat = 30]\n"        
        
        foreach word [lsort $words] {
                puts "[syllables $word]\t$word" 
        }

Comments

kpv couple of weird English words

  • resume => 1
  • perfume => 2
  • ague => 1
  • hope => 2
  • fire => 1
  • hour => 1
  • squirrel => 1

The words resume, ague and hope are definitely wrong, but it's debatable how many syllables fire, hour and squirrel have.