== '''Information required''' == (Josh) Since the previous software running this site was working fairly well, why was it replaced in the first place? Could we have a little information on what's going on? [CMcC] thought it was all public information. The previous incarnation was swamped by spiders. It was designed with locks and a lock breaking mechanism with a timeout of 10 minutes. The spidering was such that the timeouts timed out on valid edits, enabling multiple edits to be partially committed, thus causing corruption of data. It was considered a good idea to try to harden the server - to exclude rampant spiders. There was a mood to change, and change was necessary to prevent the inevitable repetition of the corruption. After about a fortnight, nothing was being done, so I decided I'd be willing to port the backend to my [Wub] front end, on the basis that I needed to write some hardening of the type needed anyway, and that Wub shouldn't suffer from the same network issues. In the absence of any practical alternatives or anyone to fix the problem in the time frame needed, that's what I chose to do. I get the benefit of testing Wub in a heavy duty application. The wiki gets to be hardened against attack and accident. When someone comes up with a better working implementation, I'm more than happy to hand it off to them. To that end, the wikit stuff is currently in subversion, and will be constituted as its own project. I don't plan to do much in terms of extending wikit beyond the functionality it had, but some people like [jdc] seem to have plans and willingness to put them into action. Feel free to contribute! ---- (Josh) Thanks Colin for your great efforts! Great attitude! Great spirit of initiative! You rolled out your sleeves, you spit in your hands and you moved forward. Way to go! Unfortunately, in all modesty, I must admit that I am not versed enough in these sort of problems but perhaps others are and they might contribute solutions. They might even be able to contribute an algorithm of some kind. I doubt it though since you are playing in a very very specialized area. But let's remain optimistic. My 2 Euros: wouldn't it be a good idea to only let in participants with passwords the same way it is done in the chat? This way vandals won't be able to vandalize the wiki and we could go back to the old Wikit? It seems to me (and to a lot of wiki webmasters) that times have changed: there are way too many cookoos out there so we cannot leave the gate open at night like we did in the old days. This wiki has always been very peaceful thanks to the fine and dedicated participants from all around the world therefore participants have never caused a single problem here; vandals are the ones who screwed up the wiki. They shouldn't have access to the wiki in the first place. We should close the gate. [stevel] Josh, this is a regular question and there's a regular answer ;) Consider the analogy of a shop window. Occasionally you get vandalism, but the solution isn't to board up the shop window, but rather to replace the glass on the rare occasions it is smashed, and perhaps install some security lighting. The wiki is Tcl's shop front and so we want to avoid boarding it up. One design goal of the wiki is to avoid barriers to people contributing (even if that means occasional vandalism). That's why we don't require passwords. (Josh) Great answer! Thanks Steve. IMHO in the absence of a technical solution, the implementation of a passwords system could be the solution however. Is it possible to examine how other wikis have fixed the very same problem and what solutions were implemented? I am sure they must also have been attacked by spiders. [stevel] No, the implementation of a password system is explicitly not what we want. This decision has been quite deliberate and well considered over a number of years. Passwords are a barrier to people contributing, and they don't stop spammers. The spidering issue has been dealt with via a honeypot (visit wiki page 5 if you want to see it in action). Also, forcing people to register before editing means we get a cookie on their browser, so we can detect persistent spammers should that become necessary. And once the revisions are back working again it will be easier to restore after vandalism. I'm not suggesting this system is perfect, but it is sufficient for now and preserves the open nature of his wiki. We could do a lot worse. [dkf]: There are a number of ways to implement anti-spam measures, and it has been a long-standing policy of the Tcler's Wiki to avoid techniques that discourage contribution. Instead, we've used a policy in the past of relying on the community to spot spamming and revert it rapidly. There are a few technical components to support that policy not yet online, but experience shows that spam isn't a big problem with a large community of vigilant [Recent Changes] watchers. However, by encouraging people to always contribute with a consistent ID (something which was not consistently done before) it makes it easier to trace activities of persistent and annoying scum and put in place measures to deal with them as necessary. (Josh) We certainly all trust you guys are doing the best for this wiki. Interesting! I am new to this cookie security approach. In the password system: a spammer (or a vandal) requires a password; he gets it, he spams and he keeps on spamming (or vandalizing) until his password is revoked. Then he gets a new password using another e-mail address and he starts the same behaviour over again. '''Question:''' Is that what you're suggesting? The problem seemingly with your cookie approach is that you end up blocking complete IP networks just to stop one individual from spamming when with the password system you only block one spammer at a time. But then again I could very well be mistaken. Since the cookie installed cannot be edited, you could therefore stop Joe Blow@142433 and he won't be able to post from his computer anymore but JamesK@142433 could be able to post however. Am I right? '''Question:''' Can you actually stop anyone from deleting a cookie in his computer and take a new one? Went to page 5. Hey you need good eyes to read the characters. I am sure more than one honest participant will be caught in the web! I also tried to make sense of what Colin wrote above: ''It was designed with locks and a lock breaking mechanism with a timeout of 10 minutes. The spidering was such that the timeouts timed out on valid edits, enabling multiple edits to be partially committed, thus causing corruption of data.'' It is very well written, well formulated but perhaps not clear enough for non-experts like me. '''Question:'''What does this all mean? Can you provide examples? No offense! I am not trying to be mean or difficult or nosy; it's just that when I don't understand something I ask questions. That's how I learn. :-) [DKF]: Two reasons why I'm not giving details: 1. There's no fixed rules anyway; we can tune our response as we see fit. (Let the Kangaroo Court decide their fate!) 2. I don't want to give spammers a recipe for working around our response. However, an example might be to review all changes they made while using a particular cookie, and to ban logins from their subnet (perhaps making it look like an unfortunate server bug) since it is fairly easy to find that info out and Tclers are mostly fairly dispersed (i.e. not too much risk of collateral damage being harmful). (Josh) Great answer! Hazy but reasonable. If you can develop strategies to counter them we are all for it. As you say, no need to get into details and give spammers and vandals recipes. I trust you can develop tools to counter them with the system you're putting in place and this is what counts. I also trust the Kangaroo court will judge in good conscience! Kangaroos have been known to be great judges! :-) They jump around like all good judges do! :-) But do they fall asleep in the middle of a court session like judges sometimes do? As for the mysterious paragraph: ''It was designed with locks and a lock breaking mechanism with a timeout of 10 minutes. The spidering was such that the timeouts timed out on valid edits, enabling multiple edits to be partially committed, thus causing corruption of data.'', any further enlightenments? All in all, thanks so much to the Teclers' New-Zealand and Australia connection (Oceania): Colin and Steve (from Digital Smarties) for your help in this difficult time. ---- '''And the light came!''' :-) [LV] Josh, basically, the best guess as to what happened was that the code to prevent multiple people from trying to update the single wiki file failed, and the single file was corrupted. Looking at log files seemed to indicate that around the time of the problem, a large number of files were being requested by a particular address. The files included the edit page urls - each one of which caused the initiation of a timer, initially designed to give someone about 10 minutes to edit and submit a page. When the software timers ran out, the system began allowing requests for those pages to occur. It seems likely that multiple updates occurred during this time of losing page edit locks, resulting in corrupted data. (Josh) Thanks Larry! Now I get it! Or I am very close to getting it. So let's recap: * At one point, the ''edit conflict'' code did not work properly and therefore the same page could have been accessed by many participants (instead of one as it should be and others would get an edit conflict message) in a short amount of time thus causing corruption of the data. ** Example: someone came at 1:35 and started editing. He saved the text at 1:42. At 1:37, someone else saved the page. When the guy saved at 1:42, the edit was accepted (it shouldn't have been) and therefore the guy's edit made at 1:37 was gone! So far so good. Of course had the system worked properly, the guy saving at 1:42 would have gotten an edit conflict message saying that between 1:35 and 1:42, at 1:37 precisely, someone else had edited the page. This did not happen. OK. Understood. * Who noticed that the data was corrupted? * You write: ''Looking at log files seemed to indicate that around the time of the problem, a large number of files were being requested by a particular address''. So there was this creep somewhere in lala land using a device to put the server out of service by requesting too many pages at the same time, way more than the server could handle. And this creep's actions were shown in the logs. OK. * By making so many requests at the same time, the creep put the edit conflict out of function thus causing the corruption of the data. Right? * Then you write: ''The files included the edit page urls - each one of which caused the initiation of a timer, initially designed to give someone about 10 minutes to edit and submit a page. When the software timers ran out, the system began allowing requests for those pages to occur.'' Therefore the 10 minute time-out code did not work either due to the surcharge of the server and the server accepted them. ** I understand that a user has 10 minutes to save the page by the time he starts editing. The idea is that this way the server is not working excessively timing out ceaselessly. Right? * And finally, you write: ''It seems likely that multiple updates occurred during this time of losing page edit locks, resulting in corrupted data.'' It's all well said. Very clear. Thanks! Can we then say it this way: ''Both the edit conflict function and the edit locks function rendering an edit unsavable after 10 minutes did not work properly or did not work period because of the stress caused by the vandal's multiple requests on the server and multiple edits were made to the same pages in no orderly fashion and thus the data got corrupted.'' Now let's re-read what Colin has written in the light of your comment: ''The previous incarnation was swamped by spiders. It was designed with locks and a lock breaking mechanism with a timeout of 10 minutes. The spidering was such that the timeouts timed out on valid edits, enabling multiple edits to be partially committed, thus causing corruption of data.'' That is more or less the same thing. It's the ''partially committed'' phrase that I still don't get. Otherwise, Colin's explanation was well phrased except that I did not have the proper background informaton and technical knowledge to understand it. Now I do! OK. Thanks again! Considering the above, doesn't it make sense to put in place a mechanism that won't allow '''any''' user to make more than say 5 edits every 5 minutes? I have seen this done on other wikis. I suspect this was done to counter such server attacks but also to curb the enthusiasm of certain obsessive-compulsive and other similar users who were posting way too often :-) It worked very well. I believe the name of this function was time-limited editing or timed edits. The creep was probably spidering all wikis in the hopes that such a mechanism had not been put in place. He came here and it worked. This being said, it was a very wise decision to close the wiki all together, to fix the problem and it was also another wise decision to go slowly but surely to fix the problem. I would appreciate comments concerning the possible implementation on this timed edits solution. This timed-edits solution coupled with the TCL's Oceania Connection's efforts to strengthen the server should bring an excellent solution to the current problem, I believe. With such a protection, I fail to see how a spider could cause any strain on the server! Hopefully the code will be so good that it will cause strain on the creep's computer. That I would like! :-) '''Later''' I went on a Moin Moin wiki and I had to save within the next ten minutes after I clicked on the edit button. This was clearly indicated. Sounds familiar? Well I believe this wiki works on the same principle therefore my example above is '''not''' valid. The way I presume the algorithm works here is this way: when A clicks on edit, he has 10 minutes to save his edit. He has more or less taken control of the page. During those ten minutes anyone who wants to edit will be shown the edit conflict message and won't be able to save. I think therefore that when the server went berserk after the creep's attacks, anyone could save while A was within his 10 minutes period and consequently there was ''total chaos'', everyone deleting each other's edits and the data got corrupted. Now I understand Colin's paragraph much better: ''The spidering was such that the timeouts timed out on valid edits, enabling multiple edits to be partially committed, thus causing corruption of data.'' In other words, after 2, 3 minutes after clicking on edit, the time-out was called and the editor could not save his text! I think Colin means: ''enabling multiple edits to be '''corrupted''' thus causing corruption of data. In fact there was a single locks problem since edit conflicts and locks are related. Anyway, the solution remains the same: as mentioned above, I believe not allowing more than a certain number of edits per user in a time frame could also be a part of the solution. ---- [[[Category Wikit]|[Category Discussion]]]