Arjen Markus This lengthy title is meant to introduce a subject for the mathematically interested. It has to do with replacing strings by other strings in a short text and suddenly finding memory has exhausted.
Superexponential growth in text manipulation This note discusses a small problem in the area of text manipulation. To get some feeling for this subject, it is perhaps good to know the circumstances under which it arose. I was trsting a small utility that would edit text files based on a few simple commands: you could replace strings that matched a certain pattern, by something else, you could insert the contents of another text file after a line containing a string pattern or replace that text by it. That was (and is) about all the utility did.
While I was testing it, I used the same small text file (some 20 lines of text) over and over again and though I had noticed that its size grew, I had not expected to run out of memory - at last the utility needed some 100 MB of memory all of a sudden.
This led me to consider the mathematics of the process in more detail. We can strip it down to an example like below:
Initial string: AbbAbb After one round: AAbbAbbbbAAbbAbbbb After two rounds: AAAbbAbbbbAAbbAbbbbAAAbbAbbbbAAbbAbbbbbbAAAbbAbbbb- AAbbAbbbbbbbbAAAbbAbbbbAAbbAbbbbAAAbbAbbbbAAbbAbbbbbb- AAAbbAbbbbAAbbAbbbbbbbb As you can see, it is growing very fast!
There are a number of variations possible:
Let us examine what happens in each of these cases:
N(k+1) = N(k) L(k+1) = L(k) + N m
So: N(k+1) = N(k) + N(k).n = (n+1) N(k) L(k+1) = L(k) + N(k) m The solution is: N(k) = (n+1)^(k-1) N(0) L(k) = L(0) + N(0) m ((n+1)^k -1)/n
N(k+1) = N(k) + N(k) .n - N(k) = n N(k) L(k+1) = L(k) + N(k) (m-1) And the solution is: N(k) = n^(k-1) N(0) L(k) = L(0) + N(0) m (n^k -1)/(n-1)
So: N(k+1) = N(k).N(k) = N(k)^2 L(k+1) = L(k) + N(k) (L(k)-1) = N(k) L(k) A closed solution for N(k) is easy to find: N(k) = N(0)^P(k) , P(k) = 2^k The solution for L(k) is found when you take the logarithm of the above equation: ln L(k+1) = ln L(k) + ln N(k) = ln L(k) +2^k ln N(0) Therefore: L(k) = L(0) + N(0)P'(k) , P'(k) = 2^(2^(k+1)-1)
N(k+1) = N(k) + N(k).N(k) = (1+N(k)) N(k) L(k+1) = L(k) + N(k).L(k) = (1+N(k)) L(k) As far as I know this problem has no closed solution.
Below is a table that compares these five problems (N(0)=n=2, L(0) = m = 6):
Iteration Case 1 Case 2 Case 3 Case 4 Case 5 k Nk, Lk Nk, Lk Nk, Lk Nk, Lk Nk, Lk --------- ------ ------ ------ ------ ------ 0 2, 6 2, 6 2, 6 2, 6 2, 6 1 2, 18 6, 18 4, 16 4, 12 6, 18 2 2, 30 18, 54 8, 36 16, 48 42, 126 3 2, 42 54, 162 16, 76 256, 748 1806, 5418 4 2, 54 162, 486 32, 156 65536, 196608 3263442, 9790326