Version 0 of RFC5322 compliant mails with tcllib mime package

Updated 2010-05-14 11:32:15 by LinkelStephan

tcllib receive emails

Hi all! We have built a tcl daemon that receives and processes emails over pop3 (tcllib). all in all it's working great, but if we receive an email (in our case sent from thunderbird) that contains special characters (other than the 7-bit ascii characters) in the subject, the email header line is encoded like this:

 Subject: Re: Session =?ISO-8859-1?Q?l=E4uft_ab!_sid=23CBFE0=23?=

which is in fact correct, because RFC5322 says emails shall only contain 7-bit ascii chars. but the tcllib mime package does not decode this line. we tried it with tcllib 1.9 and 1.12 on Tcl 8.4

has anyone an idea how we could parse this ourselves? i did not find any info about this type of quoting. If it is interesting, here are some other Header lines:

 MIME-Version: 1.0
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: 7bit

btw, if we send emails over tcllib mime/smtp package, special characters aren't encoded as well, so it seems tcllib doesn't fully follow rfc5322.

simple solution

we have made a simple proc which works for our cases:

 proc decodeRFC5322 {string} {
         # if the line is to long, it is separated to more lines, each line beginning with a single space -> join the lines
         set string [string map {"\n " "" "\n" ""} $string]
         set result ""
         set encodings [encoding names]
         set lEncodings [string tolower $encodings]
         # the encoded text has the following format: =?<encoding>?Q?<encoded_text>?=
         # in <encoded_text> every non-7-bit-ascii character is encoded like =<hexcode> (<hexcode> is exactly 2 characters long)
         # additionally, spaces are replaced with underlines.
         # It's possible, that more than one encoding parts are in the same header
         while {[set startIndex [string first "=?" $string]] > -1} {
                 set delimiterIndex [string first "?Q?" $string $startIndex]
                 set endIndex [string first "?=" $string $startIndex]
                 # look, if the found indices match the pattern, otherwise break...
                 if {$delimiterIndex - 1 > $startIndex + 2 && $endIndex > $delimiterIndex + 3} {
                         if {$startIndex > 0} {
                                 append result [string range $string 0 [expr $startIndex -1]]
                         }
                         set encoding [string range $string [expr $startIndex + 2] [expr $delimiterIndex - 1]]
                         set encodedText [string range $string [expr $delimiterIndex + 3] [expr $endIndex - 1]]
                         set string [string range $string [expr $endIndex + 2] end]

                         set i -1
                         set lEncoding [string tolower $encoding]
                         # try to examine the encoding name
                         if {[set i [lsearch -exact $lEncodings $lEncoding]] == -1} {
                                 # e.g. the iso-encodings have a dash in the string, but in tcl's encoding list, not. perhaps this string map has to be extended in future?
                                 if {[set i [lsearch -exact $lEncodings [string map {"iso-" "iso"} $lEncoding]]] == -1} {
                                 }
                         }
                         # only if we found an encoding, we proceed... otherwise append the quoted text as is...
                         if {$i > -1} {
                                 set targetEncoding [lindex $encodings $i]
                                 # spaces are encoded with underlines
                                 set text2decode [string map {"_" " "} $encodedText]
                                 # subst [string map {"=" "\\x"} $text2decode] does not work, because \x possibly takes more than 2 following characters
                                 # => iterate over it...
                                 while {[regexp {^([^=]*)=([a-fA-F0-9]{2})(.*)$} $text2decode dummy pre code post]} {
                                         append result $pre
                                         append result [encoding convertfrom $targetEncoding [subst -nocommands -novariables "\\x$code"]]
                                         set text2decode $post
                                 }
                                 append result $text2decode
                         } else {
                                 append result "=?$encoding?Q?$encodedText?="
                         }
                 } else {
                         break
                 }
         }
         append result $string
         return $result
 }

So, with this proc we process the mails like this:

 set m [mime::initialize -string $mail]

 array set mailArray [list]

 set mailArray(source) $mail

 set headers {From To Subject Date}
 foreach h $headers {
         set mailArray($h) [list]
         if ![catch {set temp [mime::getheader $m $h]} e] {
                 set decoded [list]
                 foreach entry $temp {
                         lappend decoded [decodeRFC5322 $entry]
                 }
                 set mailArray($h) $decoded
         }
 }