Back! (7)

For those who were not aware, MontrealTechWatch was down from yesterday early morning till 8.00pm today 17th of July.
It might seem normal and in the-order-of-things that the server comes back; for most sys admins, it’s just a matter of opening a ticket and the tech support would restart somehow the whole thing. But this time, it was radically different. Just a few hours ago, it was considered to be un-recoverable *sweats* , and with it databases **shivers** plus all generated files for the past 2 years ***faints***. We tried one last hack, which miraculously worked.
For those curious about technical details, this server hosts many websites and services. It hosts for instance a RoR site, graciously hosted since it’s a friend’s, plus another experiment, using Phusion Passenger. I’ve discovered that mod_rails has a big memory problems and leaves around dead processes; which I intended to solve by writing a god-like ruby script that would kill & clean processes, and even if the parent process was defunct and couldn’t be killed. Fast-forward, yesterday morning, this script launched the system command kill -9 1 … with the script owned by root user… which is the equivalent of shooting yourself in the head … while jumping from a plane 30000 feet high. XenServer can’t even restart, reinstall snapshot backups, relaunch, nor be re-setup, and all files & databases were deemed lost and inaccessible.
MTW is taken very seriously and I know of its importance; and this should never happen again. There’s one thing to blame here, which is trying to use experimental scripts on a production server. If this was a company, I would have fired the Linux idiot who wrote the script. Oh wait… Anyway, thanks for everyone who were there, it’s much appreciated. I’ll look into getting an additional resource as a sandbox and get a bulletproof environment for MTW










Yeah, who hired that Linux guy..?? ;)
Glad you’re back! That was a close one. I’ll take some time myself to make sure that I have an offline backup as well…
Welcome bark Heri! We were jonsing without our daily dose of MTW
Thanks, guys,
Hopefully will catch up and publish a couple more articles
We spotted something similar with v2.0.1 of Passenger. After debugging it with my colleagues and the modrails developers, they released two patches in v2.0.2 that solved the issue for us. Before the patch, our installation was eating all memory address space and eventually died. Consider upgrading?
Glad to hear things are back!
Glad you’re back, your site is indeed very useful.
SmTt
Welcome back Heri!
Glad MTW is back. That kind of stuff happened to the very best lately.
Leave a Reply