Grrrrr...

Aug. 19th, 2005 08:29 am
bjarvis: (Brian's brain)
[personal profile] bjarvis
If I wasn't so tired, I'd be rather pissed off right now. Or perhaps: if I wasn't so pissed off, I'd have drifted off to sleep by now. Take your pick.


The older Sun Enterprise 5000 servers are coming close to their end-of-life so my employer has been replacing them with Sun Fire 4800/6800 models. I've been spearheading a project with our customer integration group for just such an upgrade.

Three weeks ago, I had the new server properly configured and ready to roll: all I needed was the night shift to switch over from the old machine to the new machine, then have the Sybase team update the database config. All would be ready in an hour in time for operations the following morning with no service interruption. Somehow, it all went to hell in the middle of the night but no one --not on my own Unix team or the Sybase team-- thought to call or page me or any of the other principals to inform us while there was an issue: we all discovered there were problems when we arrived at work the following morning and we were still running from the old hardware. And they pretty much trashed the new server config I had built... it took most of the day to rebuild my work.

I had a few terse words with the appropriate people about that one.

Last night was our second big attempt to do the cutover. Again, I had everything carefully staged and prepared. Again, all the night shift had to do was switch to the new machine and the Sybase team do their config changes. They also had numerous direct instructions to call or page me if anything should look even slightly less than perfect.

Having a very low tolerance for failure, I stayed up until 1 AM, watching the progress from home over the network. I slept for a while, then got up at 4. There were no e-mail or pager updates so I sent a page to the night crew asking for a status report. I got nothing. I tried telephoning and got only voice-mail. I paged again and called the command center but there were no updates. At 5 AM, I gave up, got dressed and headed to the office to see for myself.

Sure enough, there were problems. Two file systems were corrupted. After some drilling of the rather clueless Sybase weasel, we eventually discovered that the Sybase engine was attempting to manipulate those file systems as raw volumes, not as file systems. Further, we discovered that they had munged their backup from the old machine taken the prior evening. Did anyone page me? No. Did they follow procedure to escalate this to their managers? No. Grrrrr...

At 7 AM, I managed to corral the appropriate people on a conference call, removed the clueless person from the project and replaced her with someone I knew was competent. As I write this, we're reloading the data into the database and racing against a 9 AM deadline. I think we'll make it but it will be close. If not, I'll be hosting a meeting with certain staff & managers this afternoon where some folks can explain in painful detail the events of the evening.

An update

Date: 2005-08-19 02:55 pm (UTC)
From: [identity profile] bjarvis.livejournal.com
Good news: the "all clear" has just been declared. The new server has been running in place for almost two hours and all tests have confirmed it's in good shape.

Whew.

Date: 2005-08-19 05:17 pm (UTC)
From: [identity profile] excessor.livejournal.com
I've been in a similar situation. Since the outlaw of torture, I've resorted to painful and tedious post-project assessments (PPA) as a way of reminding people that expectations were not met. If necessary, PPA recommendations appear in year-end evaluations. The point generally gets made over and over again.

But that's just me.

Date: 2005-08-19 06:22 pm (UTC)
From: [identity profile] bjarvis.livejournal.com
They've outlawed torture? Damn... that was one of my more effective tools. :-^

Date: 2005-08-19 06:25 pm (UTC)
From: [identity profile] excessor.livejournal.com
One longs for the good ol' days.

January 2021

S M T W T F S
     1 2
3456789
10111213141516
17181920212223
24252627282930
31      

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 10th, 2026 11:31 am
Powered by Dreamwidth Studios