Version 4 of string totitle

Updated 2005-07-18 19:28:21

string totitle string ?first? ?last?

Returns a value equal to string except that the first character in string is converted to its Unicode title case variant (or upper case if there is no title case variant) and the rest of the string is converted to lower case. If first is specified, it refers to the first char index in the string to start modifying. If last is specified, it refers to the char index in the string to stop at (inclusive). First and last may be specified as for the index method.


LES on July 18 2005: I made a little script to fix music file names, so that "foo bar.ogg" becomes "Foo Bar.ogg", for example. I do that with a foreach loop, checking conditions on each word and modifying them accordingly with string totitle. The loop helps me keep certain words untouched. I provided a few "noise words", so that "foo of the bar.ogg" becomes "Foo of the Bar.ogg". Great.

Now I just ran into this file named "foo bar (last act).ogg", and it becomes "Foo Bar (last Act).ogg". Of course, "(last" is not modified as I had expected because its first character is the parenthesis.

Does anyone here think that the string totitle command should handle that kind of exception on its own? Like, say...

 string totitle -ignore {( ) [ ] \{ \} " ' # @ & *} $string

... so that all these characters would be ignored if in the beginning of a word and so that the first actual letter is uppercased?

LV At first blush, I agreed with you. However, then I tried this:

 % string totitle "This is not enough-let's get more"

This is not enough-let's get more

and suddenly realized that I didn't remember the way the function worked - I was thinking that the function performed its action on the entire string. However, it's only working on the first character. So it's up to the code writer of the loop to handle the what's a word algorithm. In that case, what I'd suggest is writing some code to put into tcllib to loop over the words of a string (with various options for defining what a word is).

That case is even worse

MG Here's a proc which will run string totitle on the first character matching a specific regexp pattern (default is \w, for a word character), skipping any initial characters that don't match the pattern:

 proc titleCase {word {pattern {\w}}} {
   if { [regexp -indices -- $pattern $word match] } {
        set num [lindex $match 0]
        set num2 [expr {$num-1}]
        return [string totitle $word $num]
      } else {
        return $word
      }
 };# titleCase

Examples:

 % titleCase test
 Test
 % titleCase "(test)"
 (Test)
 % titleCase {$^"2}
 $^"2

So then you can just use something like

  set sentence "This is mY (test) !sentence!"
  set ignore [list is of for the]
  set finished [list]
  foreach x [split $sentence " "] {
           if { [lsearch -dictionary $ignore $x] > -1 } {
                lappend finished $x
              } else {
                lappend finished [titleCase $x]
              }
          }
  set sentence [join $finished " "]

Which gives you:

 This is My (Test) !Sentence!

See also:


Tcl syntax help - Category Command