Version 6 of Syllable Counting

Updated 2016-02-04 20:47:26 by WJG

Summary

WJG (02/02/16) Obtaining the number of syllables in an English word is quite tricky because spellings can be irregular. For many languages a simple a vowel count would be sufficient but even this will throw up some inaccuracies in English. The following procedure shows a relatively easy approach to the problem.

* Remove initial 'y' (y is a semi-vowel and here acts as a consonant).

* Count the number of vowels (including y as a vowel).

* Reduce the count by the number of dipthongs.

* Reduce the count by silent vowel endings or modifying 'e's.

* If the total is less than 1, must be 1. (Aspirated, eg. psst!)

Code

#---------------
# syllables.tcl
#---------------
#!/bin/sh
#\
exec tclsh "$0" "[email protected]"

#---------------
# Obtain number of syllables in an English word
#---------------
# Arguments:
#        str word
# Returns:
#        number of syllables
#
proc syllables { str } {
        
        set res 0
        
        # functions as a semi-vowel, i.e. as a consonant.
        set str [string trimleft $str y]
                
        # count total number of vowels
        foreach item {a e i o u y} {
                incr res [llength [regexp -all -inline (?=$item) $str]]
        } 
        
        # discount dipthongs, includes reversals
        foreach item {ai ie ei io ee ou oo oi ea ue ui} {
                incr res -[llength [regexp -all -inline (?=$item) $str]]
        } 

        # discount irregular word endings, typically containing e
        foreach item {ce nge me te ne ve re ye ue ze se eye} {
                incr res -[llength [regexp -all -inline (?=$item) $str]]
        } 

        # any word, even if it has no vowels will have at least 1 syllable, eg. psst!, shhh!
        if { $res < 1 } {
                set res 1
        } 

        return $res
}

        set words "
                colour allure yatch yahoo 
                yeti jeeze employees footy 
                early yearly psst phut 
                eye lye lie hectic 
                pneumatic aromatic automatic clinique"

        puts "syl.\tword\n[string repeat = 30]\n"        
        
        foreach word [lsort $words] {
                puts "[syllables $word]\t$word" 
        }

Comments

kpv couple of weird English words

  • resume => 1
  • perfume => 2
  • ague => 1
  • hope => 2
  • fire => 1
  • hour => 1
  • squirrel => 1

The words resume, ague and hope are definitely wrong, but it's debatable how many syllables fire, hour and squirrel have.

proc syllables:string_occurences {sub str} {

    set j [string first $sub $str 0]
   
    if {$j == -1} { return 0 }

    set res 1
    
    while 1 {
        set j [string first $sub $str [incr j]]
        if { $j == -1 } { break }
                incr res                
    }

    return $res
}

proc syllables { str } {
    
    set dipthongs [list ope eu aa ae ai ao au ea ee ei eo eu ia ie ii io oa oe oi oo ua ue uee ui uu ya ye yea]
    set emods [list ate eye ife ive]
    set irregulars [list uer gue]
    set wordends [list ought ough ound ate eye ely nge uer our es ey ys ce de ke me ne pe te ve re ue ze se uy]
    
        
    set res 0
                        
    # count total number of vowels
    foreach item $vowels {
        incr res [syllables:string_occurences $item $str]
    } 
    
    # discount dipthongs, includes reversals
    foreach item $rhyme::emods {
        incr res -[rhyme::string_occurences $item $str]
    }
                
    # discount dipthongs
    foreach item $dipthongs {
        incr res -[syllables:string_occurences $item $str]
    } 

    # discount irregular word endings, typically containing e
    foreach item $wordends {
        set a [expr [string length $item] -1]
        set b [string range $str end-$a end]
        if { $b == $item} {
            incr res -1
            break
        }
    } 

    # discount for irregulars
    foreach item $irregulars {
        incr res [syllables:string_occurences $item $str]
    } 

    # any word, even if it has no vowels will have at least 1 syllable, eg. psst!, shhh!
    if { $res < 1 } {
        set res 1
    } 

    return $res
}