String Manipulation Functions

Introduction

This page is made to collect as many as possible of TCL procedures that play with the positions of the words of a string or the characters' position in a string while they don't exist in the manual. Add and share as many as you can. Please try to follow the formatting of this page for easy access.


Reversing The Order Of The Words

Author: Rani Fayez Ahmad Superlinux

The Given

We have a string like this:

 Rani is here too

We want to have it as :

 too here is Rani

Also regardless of the number of spaces between the words, we should do word order reversal.

The Solution

proc reverse_word_order { all_original_text } {
# 1- Reverse the orignial text
set reversed [ string reverse  $all_original_text] 

# 2- split the reversed string $reversed into a list of words then loop over them
set list_of_reversed_words [split $reversed  ]

foreach reversed_words $list_of_reversed_words {

# 3- find the indices of the extremes of each reversed word in the $reversed

set word_start [ string first $reversed_words $reversed $word_start]
set word_end [ expr $word_start -1 + [string length $letter] ]

# 4- reverse the current-in-the-loop reversed word back to its normal state, e.g: 
# if i have a word "loohcs" then convert it by reversing it to "school"

set original_word [string reverse [ string range $reversed $word_start $word_end] ]

# 5- replace  the reversed word (loohcs) with the correcte one (school)

set reversed [ string replace $reversed $word_start $word_end $original_word]

# 6- set the start-of-search index to the index 
# directly after the ending of the current word

set word_start [expr $word_end +1]

# 7-continue to the next loop  
}

#print the result
return $reversed
}

Another Simpler Solution

This uses regular expressions.

proc reverse_word_order {str} {

#1-Reverse the whole string

set $reversed_str [string reverse $str]

#2- split the reversed string into a list of words
set reversed_words [ split $reversed_str]

#3- reverse and replace  every word occurrence in the reversed string
foreach word $reversed_words {

      regsub -all "\\m$word\\M" $reversed_str [string reverse $word] reversed_str   


}

return $reversed_str
}

Modern Solution

Perhaps this might do?

   proc w-reverse { string } {    lreverse $string }

Ro Realistically this will break when your string doesn't evaluate as a valid tcl list.

jbr OK -- then how's this:

   proc w-reverse { string } { join [lreverse [regexp -inline -all -- {\S+} $string]] }

Get the list of all ASCII (Latin/Greek/Russian) sentences/strings

Author: Rani Fayez Ahmad Superlinux

The Given

Lets say we have a mixture of ASCII words and Unicode words in a string, for example Arabic and Latin words, and we want to extract continuous sequences of ASCII characters and put them into a TCL [list]. Also, in the result, add to the list the starting index of each ASCII part. Lets say we have this as an example input :

set y "hello all peoebla كنتمنبستيى  اتياتاتايب ما غغخعاتلا  jhjhfdyigrwyih hruiehbbqbcvvb jenhwcbebeمااخؤصىقلاقىاؤ تااقصىقرثالا  khwbvbrehb"

What I should get is this :

{{hello all peoebla } 0} {{  jhjhfdyigrwyih hruiehbbqbcvvb jenhwcbebe} 52} {{  khwbvbrehb} 121}

The Solution

proc list_of_all_ascii_parts_a_unicode_string { arabic_string} {

set ascii_parts_list [list]
set length [string length $arabic_string]
   for {set i 0} { $i< $length } {incr i } {
      set start_of_ascii $i
      
      
     
      set end_of_ascii  $start_of_ascii 
      while { [string is ascii [string range $arabic_string $start_of_ascii $end_of_ascii]] ==  1  && $i<$length }  {
      
      puts [string range $arabic_string $start_of_ascii $end_of_ascii]
      incr i
      incr end_of_ascii 
      }
      
      incr end_of_ascii -1
      
      set ascii_part  [ string range $arabic_string $start_of_ascii $end_of_ascii]
      
      if { [string trim $ascii_part] !="" } {
        
        set ascii_parts_list [ linsert $ascii_parts_list end [list $ascii_part $start_of_ascii ] ]
      
      }
    }
    
    return $ascii_parts_list
}

AMG: For history on this page dated 2013 and older, see [L1 ].