bjarvis: (Plankton)
bjarvis ([personal profile] bjarvis) wrote2007-07-25 09:51 am

LJ returns, not that it should have gone anywhere in the first place...

LJ isn't such a huge portion of my life that I was severely inconvenienced by last night's outage. God knows I had enough other things to work on.

Still, how is it that a single power outage can take down the entire system? Do they not have backup power, servers distributed between more than one data center to provide redundancy, etc?

Six Apart, you're looking like a bunch of high school amateurs. Time to step up to the first tier of professionalism.
jss: (grouchy)

[personal profile] jss 2007-07-25 01:58 pm (UTC)(link)
Respectively: Incompetence and ineptitude in planning, design, and forethought; apparently not, or at least not enough to have covered the length of the initial outage; no.

Six Apart also left LiveJournal for last in their recovery process, doing all of their other services (TypeKey, TypePad Service, TypePad Blogs, and Vox, in that order) first.

[identity profile] abqdan.livejournal.com 2007-07-25 02:00 pm (UTC)(link)
Proving once again it's hard to make money from 'free'. Although they have a growing population of paying customers, I suspect they have millions of free-loaders, just like me! I wonder what their revenues look like?

Still, I'd expect a lot of these companies to host at a hosting company with multiple redundant power, POPs, network routers etc. The IAGSDC web site does after all! (see http://www.serverpoint.com/english/ournetwork.phtml for a diagram).

[identity profile] manley1.livejournal.com 2007-07-25 02:18 pm (UTC)(link)
Heh, we couldn't throw stones when I worked at Harvard. The building in which I was sitting for the administration for the first few years didn't even have a generator and the UPS only lasted 10 minutes. Then, when I worked for FAS, a blown UPS knocked us completely off the wire for many hours.

Hell, where I work now there is only a small series of mini UPS's powering our local machine room. Our real machines are in remote machine rooms, however, with redundant power and the like.

I guess the point I'm trying to make, if there is one, is that their practices are not uncommon, it's sad to say. I feel bad for them.

[identity profile] ciddyguy.livejournal.com 2007-07-25 02:24 pm (UTC)(link)
I know Six Apart is not perfect, but I do also know that LJ has suffered similar fates with backup power not functioning like it should even before it went to six apart. However, that said, San Francisco did get hit with a major power outage that took out some 30K customers, including a huge data center that not only affected LJ and six apart, but Craigs List and a couple of other websites too.

So I'm not inclined to totally blame Six Apart even though they did leave it to bring LJ up last. Now, don't know if that was intentional or not but still, by 8 something last night, LJ came back up.

Anyway, I know many of us had to suffer LJ withdrawal for a few hours after we came home from work. :-)

[identity profile] rsc.livejournal.com 2007-07-25 03:48 pm (UTC)(link)
Presumably they had to bring up their services in some order, and something was going to be last. No idea how they decided.

[identity profile] jwg.livejournal.com 2007-07-25 03:52 pm (UTC)(link)
Hmm, and they didn't have a bunch of squirrels and treadmills.

[identity profile] bjarvis.livejournal.com 2007-07-26 12:38 pm (UTC)(link)
PETA was getting on their asses about the squirrels. They should use grad students instead: no one cares what happens to them. :-)

Those who do not remember the past ...

[identity profile] allanh.livejournal.com 2007-07-26 03:20 am (UTC)(link)
The current news is that 365 Main's data center (located on swamp landfill in South of Market San Francisco) uses a flywheel power transfer device to avoid allowing PG&E (utility) power to directly touch any building systems.

One sustained power outage would have triggered the diesel backup generator into action, which would then have kept the flywheel spinning.

Unfortunately, nobody at 365 Main ever stopped to think about SIX short power outages, spaced closely together.

Power went off and came on too quickly the first time for the built-in timer to kick in the generator ... so the flywheel kept operating on utility power, but was spinning a little bit slower.

Power went off and came on too quickly the second time ... and the generator didn't kick in, and the flywheel kept on spinning down.

Repeat above step four more times, and the flywheel wasn't spinning fast enough to transfer power into the data center ... and there's no provision for a slowly spinning flywheel triggering the generator.

And none of the (totally stupid) employees at 365 Main thought to slap the manual cutover switch IMMEDIATELY upon being signaled of the first power outage.

That, or even more horrifyingly, they have no power outage signals in the control center.

Stupid.

What's worse ... this isn't the first time that downtown SF has had multiple small power failures leading to a big power failure. Such as happened during the Loma Prieta Quake of '89.

Really stupid.

Re: Those who do not remember the past ...

[identity profile] bjarvis.livejournal.com 2007-07-26 12:38 pm (UTC)(link)
Yeah, stupid. The part that I find painful though is that there doesn't appear to be an alternate data center.

My current employer runs two data centers, although I'm not keen on them being quite so close together (about 50 miles). My prior employer had one in the greater DC area, the alternate in Montana.

It strikes me as spectacularly foolish that any online company would place all of their family jewels in a single data center, esp. one located on a former swamp inside a an earthquake zone.

Dumb, dumb, dumb.