Syllable Counting

Summary

WJG (02/02/16) Obtaining the number of syllables in an English word is quite tricky because spellings can be irregular. For many languages a simple a vowel count would be sufficient but even this will throw up some inaccuracies in English. The following procedure shows a relatively easy approach to the problem.

* Remove initial 'y' (y is a semi-vowel and here acts as a consonant).

* Count the number of vowels (including y as a vowel).

* Reduce the count by the number of dipthongs.

* Reduce the count by silent vowel endings or modifying 'e's.

* If the total is less than 1, must be 1. (Aspirated, eg. psst!)

Code

#---------------
# syllables.tcl
#---------------
#!/bin/sh
#\
exec tclsh "$0" "$@"

#---------------
# Obtain number of syllables in an English word
#---------------
# Arguments:
#        str word
# Returns:
#        number of syllables
#
proc syllables { str } {
        
        set res 0
        
        # functions as a semi-vowel, i.e. as a consonant.
        set str [string trimleft $str y]
                
        # count total number of vowels
        foreach item {a e i o u y} {
                incr res [llength [regexp -all -inline (?=$item) $str]]
        } 
        
        # discount dipthongs, includes reversals
        foreach item {ai ie ei io ee ou oo oi ea ue ui} {
                incr res -[llength [regexp -all -inline (?=$item) $str]]
        } 

        # discount irregular word endings, typically containing e
        foreach item {ce nge me te ne ve re ye ue ze se eye} {
                incr res -[llength [regexp -all -inline (?=$item) $str]]
        } 

        # any word, even if it has no vowels will have at least 1 syllable, eg. psst!, shhh!
        if { $res < 1 } {
                set res 1
        } 

        return $res
}

        set words "
                colour allure yatch yahoo 
                yeti jeeze employees footy 
                early yearly psst phut 
                eye lye lie hectic 
                pneumatic aromatic automatic clinique"

        puts "syl.\tword\n[string repeat = 30]\n"        
        
        foreach word [lsort $words] {
                puts "[syllables $word]\t$word" 
        }

Comments

kpv couple of weird English words

  • resume => 1
  • perfume => 2
  • ague => 1
  • hope => 2
  • fire => 1
  • hour => 1
  • squirrel => 1

The words resume, ague and hope are definitely wrong, but it's debatable how many syllables fire, hour and squirrel have.

WJG (04/06/16) KPV, thanks for the comments. Here's a revised method which is more effective. This time regexp has be replaced with a custom proc. This is because regexpr allowed some count errors to occur as it didn't offer an exact match (perhaps it might do, but I'm no regexp expert). Also, because there are so many archaic spellings in English, the various filters may need further 'massaging' to get the results expected. Where did you a 'ague' from? It was a good test. This revised version will also differentiate between French load words such as 'resume' and 'resumé'.

proc syllables:string_occurences {sub str} {

    set j [string first $sub $str 0]
   
    if {$j == -1} { return 0 }

    set res 1
    
    while 1 {
        set j [string first $sub $str [incr j]]
        if { $j == -1 } { break }
            incr res                
    }

    return $res
}

proc syllables { str } {
    
    set dipthongs [list ope eu aa ae ai ao au ea ee ei eo eu ia ie ii io oa oe oi oo ua ue uee ui uu ya ye yea you]
    set emods [list ate eye ife ive]
    set irregulars [list uer gue]
    set wordends [list ought ough ound ate eye ely nge uer our he es ey ys ce de ke me ne pe te ve re ue ze se uy]
    
    set str [string tolower $str]
    
    set res 0
                        
    # count total number of vowels
    foreach item $vowels {
        incr res [syllables:string_occurences $item $str]
    } 
    
    # discount any modifying e
    foreach item $rhyme::emods {
        incr res -[rhyme::string_occurences $item $str]
    }
                
    # discount dipthongs
    foreach item $dipthongs {
        incr res -[syllables:string_occurences $item $str]
    } 

    # discount irregular word endings, typically containing e
    foreach item $wordends {
        set a [expr [string length $item] -1]
        set b [string range $str end-$a end]
        if { $b == $item} {
            incr res -1
            break
        }
    } 

    # discount for irregulars
    foreach item $irregulars {
        incr res [syllables:string_occurences $item $str]
    } 

    # any word, even if it has no vowels will have at least 1 syllable, eg. psst!, shhh!
    if { $res < 1 } {
        set res 1
    } 

    return $res
}