If I wasn't so tired, I'd be rather pissed off right now. Or perhaps: if I wasn't so pissed off, I'd have drifted off to sleep by now. Take your pick.
The older Sun Enterprise 5000 servers are coming close to their end-of-life so my employer has been replacing them with Sun Fire 4800/6800 models. I've been spearheading a project with our customer integration group for just such an upgrade.
Three weeks ago, I had the new server properly configured and ready to roll: all I needed was the night shift to switch over from the old machine to the new machine, then have the Sybase team update the database config. All would be ready in an hour in time for operations the following morning with no service interruption. Somehow, it all went to hell in the middle of the night but no one --not on my own Unix team or the Sybase team-- thought to call or page me or any of the other principals to inform us while there was an issue: we all discovered there were problems when we arrived at work the following morning and we were still running from the old hardware. And they pretty much trashed the new server config I had built... it took most of the day to rebuild my work.
I had a few terse words with the appropriate people about that one.
Last night was our second big attempt to do the cutover. Again, I had everything carefully staged and prepared. Again, all the night shift had to do was switch to the new machine and the Sybase team do their config changes. They also had numerous direct instructions to call or page me if anything should look even slightly less than perfect.
Having a very low tolerance for failure, I stayed up until 1 AM, watching the progress from home over the network. I slept for a while, then got up at 4. There were no e-mail or pager updates so I sent a page to the night crew asking for a status report. I got nothing. I tried telephoning and got only voice-mail. I paged again and called the command center but there were no updates. At 5 AM, I gave up, got dressed and headed to the office to see for myself.
Sure enough, there were problems. Two file systems were corrupted. After some drilling of the rather clueless Sybase weasel, we eventually discovered that the Sybase engine was attempting to manipulate those file systems as raw volumes, not as file systems. Further, we discovered that they had munged their backup from the old machine taken the prior evening. Did anyone page me? No. Did they follow procedure to escalate this to their managers? No. Grrrrr...
At 7 AM, I managed to corral the appropriate people on a conference call, removed the clueless person from the project and replaced her with someone I knew was competent. As I write this, we're reloading the data into the database and racing against a 9 AM deadline. I think we'll make it but it will be close. If not, I'll be hosting a meeting with certain staff & managers this afternoon where some folks can explain in painful detail the events of the evening.
The older Sun Enterprise 5000 servers are coming close to their end-of-life so my employer has been replacing them with Sun Fire 4800/6800 models. I've been spearheading a project with our customer integration group for just such an upgrade.
Three weeks ago, I had the new server properly configured and ready to roll: all I needed was the night shift to switch over from the old machine to the new machine, then have the Sybase team update the database config. All would be ready in an hour in time for operations the following morning with no service interruption. Somehow, it all went to hell in the middle of the night but no one --not on my own Unix team or the Sybase team-- thought to call or page me or any of the other principals to inform us while there was an issue: we all discovered there were problems when we arrived at work the following morning and we were still running from the old hardware. And they pretty much trashed the new server config I had built... it took most of the day to rebuild my work.
I had a few terse words with the appropriate people about that one.
Last night was our second big attempt to do the cutover. Again, I had everything carefully staged and prepared. Again, all the night shift had to do was switch to the new machine and the Sybase team do their config changes. They also had numerous direct instructions to call or page me if anything should look even slightly less than perfect.
Having a very low tolerance for failure, I stayed up until 1 AM, watching the progress from home over the network. I slept for a while, then got up at 4. There were no e-mail or pager updates so I sent a page to the night crew asking for a status report. I got nothing. I tried telephoning and got only voice-mail. I paged again and called the command center but there were no updates. At 5 AM, I gave up, got dressed and headed to the office to see for myself.
Sure enough, there were problems. Two file systems were corrupted. After some drilling of the rather clueless Sybase weasel, we eventually discovered that the Sybase engine was attempting to manipulate those file systems as raw volumes, not as file systems. Further, we discovered that they had munged their backup from the old machine taken the prior evening. Did anyone page me? No. Did they follow procedure to escalate this to their managers? No. Grrrrr...
At 7 AM, I managed to corral the appropriate people on a conference call, removed the clueless person from the project and replaced her with someone I knew was competent. As I write this, we're reloading the data into the database and racing against a 9 AM deadline. I think we'll make it but it will be close. If not, I'll be hosting a meeting with certain staff & managers this afternoon where some folks can explain in painful detail the events of the evening.
An update
Date: 2005-08-19 02:55 pm (UTC)Whew.
no subject
Date: 2005-08-19 05:17 pm (UTC)But that's just me.
no subject
Date: 2005-08-19 06:22 pm (UTC)no subject
Date: 2005-08-19 06:25 pm (UTC)