Night Shift
Oct. 20th, 2011 03:13 pmOMG, what a night. It's 2pm and I've only just crawled out of bed, hosed myself down in the shower and ate a quick 'breakfast.'
It's all work-related: I've been accumulating a long list of tasks for the rare opportunity of a complete system shutdown. My team had been agitating for one for three weeks. We finally received approval 48 hours before our proposed date but even that was screwed up: the team who communicates with customers to warn them of a maintenance window neglected to inform one of our largest clients. Somehow, 'complete shutdown' was interpreted as 'partial shutdown.' Infuriating and sloppy, but there was nothing to be done about it this close to the event. We dropped the servers from the plan which we could no longer touch, keeping the systems we power-cycle without customer impact.
I arrived early at the data center for several hours of preparation, including setting up a workspace, labeling the bits which will be repaired/touched, laying out the tools I would need and such.
The fun began at midnight EST. The team in California halted the performance monitors, disabled customer access and began installing system patches while I worked with the Fujitsu field engineer to add disks to our storage array. When those tasks were completed, I upgraded the RAM in ten HP blade servers and fixed ILO communication errors in eight more. When I completed the hardware work, the California team began restarting and testing the applications while I continued with other less critical tasks: swapping some network cables, tracing cables for three servers and racking & cabling one new server.
We re-opened customer access at 3am EST; while we didn't have 100% of the systems back up quite yet, we easily had enough to support any customer traffic at that hour of the night. Within 30 minutes, we had the rest of the systems working fully.
I cleaned up the workspace, collected my gear and headed home around 3:40am. I think my head hit the pillow at 4:30am or so.
Now to resume planning for a full & complete system shutdown window so we can do the maintenance I really wanted to do in the first place...
It's all work-related: I've been accumulating a long list of tasks for the rare opportunity of a complete system shutdown. My team had been agitating for one for three weeks. We finally received approval 48 hours before our proposed date but even that was screwed up: the team who communicates with customers to warn them of a maintenance window neglected to inform one of our largest clients. Somehow, 'complete shutdown' was interpreted as 'partial shutdown.' Infuriating and sloppy, but there was nothing to be done about it this close to the event. We dropped the servers from the plan which we could no longer touch, keeping the systems we power-cycle without customer impact.
I arrived early at the data center for several hours of preparation, including setting up a workspace, labeling the bits which will be repaired/touched, laying out the tools I would need and such.
The fun began at midnight EST. The team in California halted the performance monitors, disabled customer access and began installing system patches while I worked with the Fujitsu field engineer to add disks to our storage array. When those tasks were completed, I upgraded the RAM in ten HP blade servers and fixed ILO communication errors in eight more. When I completed the hardware work, the California team began restarting and testing the applications while I continued with other less critical tasks: swapping some network cables, tracing cables for three servers and racking & cabling one new server.
We re-opened customer access at 3am EST; while we didn't have 100% of the systems back up quite yet, we easily had enough to support any customer traffic at that hour of the night. Within 30 minutes, we had the rest of the systems working fully.
I cleaned up the workspace, collected my gear and headed home around 3:40am. I think my head hit the pillow at 4:30am or so.
Now to resume planning for a full & complete system shutdown window so we can do the maintenance I really wanted to do in the first place...