Version 13 of How To Read Large Strings without Newlines

Updated 2011-06-02 17:50:45 by AMG

XAN: I've run into a kind of surprising limitation(?) and it has me stumped. Maybe someone else here has done something similar?

The Problem

I am attempting to use TclCurl to have a "conversation" with an ASP driven webpage. A little background... these are corporate in-house webpages over a VPN and there is little to no hope of getting them to change anything. And there is a major design flaw in these webpages - the designer has decided to use the __VIEWSTATE field to save state for everything... including the kitchen sink! The result...

The stumbling block is a __VIEWSTATE field that is approximately 500K large! To communicate with the server, I have to capture this awful chunk of data and send it back in the POST request. I have code that does this, and I know that it works because it is heavily tested on pages with a sane-sized __VIEWSTATE. Trying to use the same code on a page with a large VIEWSTATE, however, just causes TCL to hang.

Possible Cause

I have no problem reading a file megabytes large into a variable, so somothing else is going on here. I looked for "offensive" characters that may be choking the interpreter but none seem to exist. The striking detail seems to to me to be the absence of newlines. The ~500K is all on one line... could it be that TCL cannot handle this scenario, or handle it well?

Things I've Tried and The Outcome

  1. I've taken a chunk of about 10K and tried building a 500K string with no newlines using append. The interpreter does get slower and slower and eventually grinds to a halt. When doing this with newlines, no slowdown occurs.
  2. I've tried every possible method of reading this data into a variable...
  • different encodings and translations on fconfigure -> read into variable
  • set variable directly using copy/paste into console
  • breaking up into chunks and rebuilding these chunks into one variable
  1. I've looked for a way to have TclCurl pull the postfields from a file. No such method seems to exist - it must come from a variable.
  2. I've tried using external options like CMD.exe type filename. Same problem. Interestingly, CMD.exe has no problem typing this file out in a CMD.exe instance.

Questions

  1. Why can't TCL handle this data?
  2. What can I do? Anyone have any suggestions or similar experience?

AMG: I wasn't aware of any problems with long strings, even when they lack newlines. The computer I'm on right now takes about thirty milliseconds to assemble a one-megabyte string consisting only of the letter "x":

time {string repeat x 1048576} 100

I suspect the problem isn't with Tcl or the way it handles long strings or newlines, but rather with the way data is exchanged with the TclCurl library. Profile your program with [time] and/or [clock microseconds] to see which command is taking the longest to complete, then split up that command and see which piece is taking too long, and keep repeating until you've isolated the culprit. Let us know what you find.

There doesn't appear to be any documentation for TclCurl on this Wiki. Maybe write a sample program or two and post it so we can see how it works.


RLE (2011-06-01) Something seems odd with your example of creating a 500K string by appending a 10K string a number of times. Note:

 % set s [ string repeat a 10240 ]
 % string length $s
 10240
 % time { for {set i 0} { $i < 50 } {incr i } {append res $s } }
 292 microseconds per iteration
 % string length $res
 512000

$res is now a 500K string of the letter "a". 292 microseconds hardly seems like "grinds to a halt". Therefore, something else seems amiss somewhere.


XAN: Thanks for the input...

I've run the code on my system and come up with similiar results.

(bin) 2 % string length $s
10240
(bin) 3 % 
(bin) 3 % 
(bin) 3 % time { for {set i 0} { $i < 50 } {incr i } {append res $s } }
1626 microseconds per iteration
(bin) 4 % string length $res
512000
(bin) 5 % 

I think that it takes slightly longer here because I'm working on an older Core2 laptop. So far so good - you've convinced me that the problem isn't likely to be with the newlines.

However, I don't think its possible that TclCurl is involved. When I open up a clean interpreter and try to set a variable with the copy/pasted VIEWSTATE content, I get the same issue.

I may have been incorrect about the special characters in the VIEWSTATE... I'm going to take a closer look at the content and see if the interpreter could possibly be hanging up trying to do substitutions.


XAN:

The problem persists and I don't understand it...

1. I took a several different 10K chunks out of the VIEWSTATE and plugged it into RLE's timing sequence above... the timing is similiar and it completes the operation each time. I am going to try and do this with the entire block - although a similiar previous attempt to do this failed.

2. I tried replacing any characters normally escaped or substituded by the interpreter. Loading the entire string still causes a hang.

3. I tried posting the entire VIEWSTATE to the wiki but that didn't work. A sample of the data looks like this:

...bGVkIFdhbmRhHjQ4MzfCoCAtIEEgR2FuZ2xhbmQgTG92ZSBTdG9yeRs0MjUzwqAgLSBBIEdvbGRlbiBDaHJpc3RtYXMmMjIx...

XAN:

I've gotten the code to work by reading in the VIEWSTATE and then reassembling this way:

    set fp [open c:/viewstate.txt r]
 
    fconfigure $fp -translation binary -buffersize 10240
        
        set CTR 0
        set data($CTR) [read $fp 10240]
          while {![eof $fp]} {
           incr CTR
           set data($CTR) [read $fp 10240]
        }
        
        
        close $fp
        
        set FINAL ""
        incr CTR

        for {set x 0} {$x<$CTR} {incr x} {
          append FINAL $data($x)
        }

When I tried breaking it up into chunks previously I wasn't doing it under the binary encoding or setting an explicit buffer. That is the only difference... but it makes a BIG difference. The whole process is now nearly instantaneous.

THANK YOU RLE for setting me off in the right direction!!! The help is greatly appreciated!

BTW - TCL ROCKS!!!


AMG: That works, but this doesn't?

set fp [open c:/viewstate.txt]
fconfigure $fp -translation binary
set FINAL [read $fp]
close $fp

Go ahead and give it a try. If it's also fast, try it without the [fconfigure] line, and let us know if that makes a difference. Also, note that I omitted the r argument to [open] because it is the default.

You can wrap your code and mine inside [time] if you want an easy way to measure it.