Proposed solutions to the current problems, Discussion

AW's observations

I'm not clear on the current state of the 'current problems' (??) of this wiki, however, I see that:

- the 'Revisions' link at the bottom of each page currently leads to a page which says 'not implemented'. So I don't think vandalism can be corrected at all, and content will just be lost? Numerous parts/pages of the wiki still refer to this feature, explain how wikit implements it etc (e.g. rules and right link from the new homepage). If I understand correctly that this site is not running wikit anymore, and that this feature is therefore not available anymore, perhaps we should annotate those pages to say so?

EMJ - See WubWikit Problems - the revisions are being kept, they are just not yet available - this is just one of many things to be fixed.

- the experience for users not (yet) logged in is bad: you click edit, you get to a page with to edit boxes, one to login (I don't want to) and one where you can edit. So you edit and click save, you go to a page where it says you should have logged in, and all your edits are lost. I know I did not care to repeat typing all that again (browsers back button did not help). If you are not allowed to edit without being logged in, you should be told so before you edit anything, in fact, you should not be able to edit anything.

EMJ - I don't see why this is a problem, it clearly says "You must have a nickname to post here".

Duoas - It is a problem only because it messes with peoples minds (it is inconsistent). One message says you can't post without a nickname. The other message says 'here is an edit box where you can make changes and here is a button labeled "save" to save your changes'. One of the top three rules in UI design is "People don't read anything (even if there's nothing else to read)".

Not that I consider it too great a problem. Once bitten you remember the next time you want to post. Also, I don't imagine most first-time posters plan to type-in more than a couple-hundred words if they're ambitious...

CMcC would like it to be known that (Josh) is a sock puppet of He Who May Not Be Named, unperson, Robert unperson, whose name is Legion. This is known technically and indubitably, and not merely by textual analysis (which nonetheless shows all the hallmarks of unperson-thought.)

[L1 ] and [L2 ] give you some good background on this rapscallion, this netkook, this irresistible force of a lesser nature.

== Information required ==

(Josh) Since the previous software running this site was working fairly well, why was it replaced in the first place? Could we have a little information on what's going on?

CMcC thought it was all public information.

The previous incarnation was swamped by spiders. It was designed with locks and a lock breaking mechanism with a timeout of 10 minutes. The spidering was such that the timeouts timed out on valid edits, enabling multiple edits to be partially committed, thus causing corruption of data.

It was considered a good idea to try to harden the server - to exclude rampant spiders. There was a mood to change, and change was necessary to prevent the inevitable repetition of the corruption.

After about a fortnight, nothing was being done, so I decided I'd be willing to port the backend to my Wub front end, on the basis that I needed to write some hardening of the type needed anyway, and that Wub shouldn't suffer from the same network issues.

In the absence of any practical alternatives or anyone to fix the problem in the time frame needed, that's what I chose to do. I get the benefit of testing Wub in a heavy duty application. The wiki gets to be hardened against attack and accident.

When someone comes up with a better working implementation, I'm more than happy to hand it off to them. To that end, the wikit stuff is currently in subversion, and will be constituted as its own project. I don't plan to do much in terms of extending wikit beyond the functionality it had, but some people like jdc seem to have plans and willingness to put them into action.

Feel free to contribute!

(Josh) Thanks Colin for your great efforts! Great attitude! Great spirit of initiative! You rolled out your sleeves, you spit in your hands and you moved forward. Way to go!

Unfortunately, in all modesty, I must admit that I am not versed enough in these sort of problems but perhaps others are and they might contribute solutions. They might even be able to contribute an algorithm of some kind. I doubt it though since you are playing in a very very specialized area. But let's remain optimistic.

My 2 Euros: wouldn't it be a good idea to only let in participants with passwords the same way it is done in the chat? This way vandals won't be able to vandalize the wiki and we could go back to the old Wikit? It seems to me (and to a lot of wiki webmasters) that times have changed: there are way too many cookoos out there so we cannot leave the gate open at night like we did in the old days. This wiki has always been very peaceful thanks to the fine and dedicated participants from all around the world therefore participants have never caused a single problem here; vandals are the ones who screwed up the wiki. They shouldn't have access to the wiki in the first place. We should close the gate.

stevel Josh, this is a regular question and there's a regular answer ;)

Consider the analogy of a shop window. Occasionally you get vandalism, but the solution isn't to board up the shop window, but rather to replace the glass on the rare occasions it is smashed, and perhaps install some security lighting.

The wiki is Tcl's shop front and so we want to avoid boarding it up. One design goal of the wiki is to avoid barriers to people contributing (even if that means occasional vandalism). That's why we don't require passwords.

(Josh) Great answer! Thanks Steve. IMHO in the absence of a technical solution, the implementation of a passwords system could be the solution however. Is it possible to examine how other wikis have fixed the very same problem and what solutions were implemented? I am sure they must also have been attacked by spiders.

stevel No, the implementation of a password system is explicitly not what we want. This decision has been quite deliberate and well considered over a number of years. Passwords are a barrier to people contributing, and they don't stop spammers.

The spidering issue has been dealt with via a honeypot (visit wiki page 5 if you want to see it in action). Also, forcing people to register before editing means we get a cookie on their browser, so we can detect persistent spammers should that become necessary. And once the revisions are back working again it will be easier to restore after vandalism.

I'm not suggesting this system is perfect, but it is sufficient for now and preserves the open nature of his wiki. We could do a lot worse.

dkf: There are a number of ways to implement anti-spam measures, and it has been a long-standing policy of the Tcler's Wiki to avoid techniques that discourage contribution. Instead, we've used a policy in the past of relying on the community to spot spamming and revert it rapidly. There are a few technical components to support that policy not yet online, but experience shows that spam isn't a big problem with a large community of vigilant Recent Changes watchers. However, by encouraging people to always contribute with a consistent ID (something which was not consistently done before) it makes it easier to trace activities of persistent and annoying scum and put in place measures to deal with them as necessary.

(Josh) We certainly all trust you guys are doing the best for this wiki.

Interesting! I am new to this cookie security approach.

In the password system: a spammer (or a vandal) requires a password; he gets it, he spams and he keeps on spamming (or vandalizing) until his password is revoked. Then he gets a new password using another e-mail address and he starts the same behaviour over again.

Question: Is that what you're suggesting?

The problem seemingly with your cookie approach is that you end up blocking complete IP networks just to stop one individual from spamming when with the password system you only block one spammer at a time.

But then again I could very well be mistaken. Since the cookie installed cannot be edited, you could therefore stop Joe Blow@142433 and he won't be able to post from his computer anymore but JamesK@142433 could be able to post however. Am I right?

Question: Can you actually stop anyone from deleting a cookie in his computer and take a new one?

Went to page 5. Hey you need good eyes to read the characters. I am sure more than one honest participant will be caught in the web!

I also tried to make sense of what Colin wrote above:

It was designed with locks and a lock breaking mechanism with a timeout of 10 minutes. The spidering was such that the timeouts timed out on valid edits, enabling multiple edits to be partially committed, thus causing corruption of data.

It is very well written, well formulated but perhaps not clear enough for non-experts like me.

Question:What does this all mean? Can you provide examples? No offense! I am not trying to be mean or difficult or nosy; it's just that when I don't understand something I ask questions. That's how I learn. :-)

DKF: Two reasons why I'm not giving details:

There's no fixed rules anyway; we can tune our response as we see fit. (Let the Kangaroo Court decide their fate!)
I don't want to give spammers a recipe for working around our response.

However, an example might be to review all changes they made while using a particular cookie, and to ban logins from their subnet (perhaps making it look like an unfortunate server bug) since it is fairly easy to find that info out and Tclers are mostly fairly dispersed (i.e. not too much risk of collateral damage being harmful).

(Josh) Great answer! Hazy but reasonable. If you can develop strategies to counter them we are all for it. As you say, no need to get into details and give spammers and vandals recipes. I trust you can develop tools to counter them with the system you're putting in place and this is what counts.

I also trust the Kangaroo court will judge in good conscience! Kangaroos have been known to be great judges! :-) They jump around like all good judges do! :-) But do they fall asleep in the middle of a court session like judges sometimes do?

As for the mysterious paragraph: It was designed with locks and a lock breaking mechanism with a timeout of 10 minutes. The spidering was such that the timeouts timed out on valid edits, enabling multiple edits to be partially committed, thus causing corruption of data., any further enlightenments?

All in all, thanks so much to the Teclers' New-Zealand and Australia connection (Oceania): Colin and Steve (from Digital Smarties) for your help in this difficult time.

And the light came! :-)

LV Josh, basically, the best guess as to what happened was that the code to prevent multiple people from trying to update the single wiki file failed, and the single file was corrupted. Looking at log files seemed to indicate that around the time of the problem, a large number of files were being requested by a particular address. The files included the edit page urls - each one of which caused the initiation of a timer, initially designed to give someone about 10 minutes to edit and submit a page. When the software timers ran out, the system began allowing requests for those pages to occur. It seems likely that multiple updates occurred during this time of losing page edit locks, resulting in corrupted data.

(Josh) Thanks Larry! Now I get it! Or I am very close to getting it.

I presume the edit conflict algorithm works this way: when A clicks on the Edit button, he has 10 minutes to save his edit. He has more or less taken control of the page. During those ten minutes anyone could click on the Edit button for this page but if he wants to save the page, he will be shown the Edit conflict message and he won't be able to save. Of course it will make more sense to disallow the clicking of the edit button for a page under time-out but this is not the way it is being done. Anyone can click on the edit button of a page during the time-out but he won't be able to save before the end of the 10 minute time-out is the way this is generally done.

When the spider attack occurred, the server went berserk and the edit conflict and the time-out code did not function properly (to say the least! In fact it was total chaos) therefore many edits have been damaged.

Considering the above, doesn't it make sense to put in place a mechanism that won't allow any user to make more than say 5 edits every 5 minutes? I have seen this done on other wikis and it works fairly well. This way no creep could ever be able to attack the wiki's server anymore! Simple!

I suspect this limited editing code was put in place on other wikis to counter such server attacks but also to curb the enthusiasm of certain obsessive-compulsive and other similar users who were posting way too often :-) It worked very well. I believe the name of this function was time-limited editing or timed edits. I remember now: the slow post function is what it was called.

This being said, it was a very wise decision to close the wiki all together, to fix the problem and it was also another wise decision to go slowly but surely to fix the problem.

I would appreciate comments concerning the possible implementation of this slow post solution.

This timed-edits solution coupled with the TCL's Oceania Connection's efforts to strengthen the server should bring an excellent solution to the current problem, I believe.

The implementation of the slow posts could be made even simpler with the implementation of the current cookie system so we would go full circle and provide an excellent solution.

The Oceania connection is fortifying the castle so that it could resist the enemy's attacks. Excellent! But it would also make a lot of sense to make sure the enemies can't use the road to the castle!

With such a protection, I fail to see how a spider could cause any strain on the server since he couldn't make more than 5 posts every 5 minutes! Hopefully the code will be so good that it will cause strain on the creep's computer if he tries to spider us. That I would like! :-) And I'm sure I wouldn't be the only one! :-)

Problem caused by the download of the wiki snapshot

Lars H: I think there is something wrong with LV's explanation. Clicking "Edit" doesn't lock the page, it only gives you a page with (a) an Edit-box, (b) "Save" and "Cancel" buttons, and (c) a hidden piece of data recording the version (basically the time it was last saved) of the page being edited. Locking the page only happens when you click "Save", and the wiki is only changed if the current version of the page when you're saving is the same version that you checked out (if they're different, you get an "Edit conflict" page). Spiders requesting Edit pages should therefore not be a problem (unless they also start clicking "Save", but that seems unlikely since it's an entirely different operation on the HTTP level). Restricting the number of edits per user would therefore not make sense.

I've seen it claimed that the spidering problem was rather due to downloads of wiki snapshots, since generating a snapshot requires (required?) locking the entire wiki, and could add up to enough time that locks set by ordinary editing operations in progress were broken. It would be interesting to see an explanation of how the switch to Wub addresses this issue...

(Josh) Larry's explanation was excellent. It might have been my interpretation that wasn't quite right.

In fact you might very well be mistaken yourself, Lars.

I believe the algo works this way or somewhere along those lines:

1. A user requests a page. Let's say Ask and you shall be given # 5

2. He edits it, he wants to save it. He clicks on Save.

3. The program looks for this page in the Pages requested for save database.

4. It gets a clearance. No one has requested to save this particular page; no time-out for it. The ten-minute page time-out starts.

Now someone comes during the ten-minute delay and requests the very same page. Fine. He clicks on edit. He sees the edit box, the Save button, the cancel etc. When he wants to save, the process described above starts (at Step 3). The page is in the Timed-out pages database. No way to save it. You get an edit conflict.

What the creep did is probably request hundreds of pages for editing and saving thus screwing the database and the code.

Therefore my suggestion makes a lot of sense. If he was on slow posts, he wouldn't have been able to save more than 5 pages every 5 minutes and all his saving requests would have been rejected except one or two.

Lars' question; Josh's solution

In any case, please do not forget to address Lars' issue as well: Lars writes:

(Josh) Larry has written that: Looking at log files seemed to indicate that around the time of the problem, a large number of files were being requested by a particular address.

Requested for what? For editing?

In any case, I would solve the current problem in the following manner:

1. I would put back Wikit

2. I would write and implement the slow posts mechanism after getting the algorithm from the wiki webmaster who has implemented it on its own wiki. Why reinvent the wheel?

We'd post on the Wikit with the slow posts mechanism added.

3. Colin would continue his work on the Wub peacefully. When it would be ready, it would replace the Wikit + the slow posts function if need be.

[Category Wikit|Category Discussion]