MV Communications: 1999 outages/events

1999 outages/events

Dec 29 00:30
We're working on a minor upgrade to the news server software; the news server is unavailable until this upgrade is finished. This is expected to take less than an hour.

Dec 22, the test numbers
The test equipment on the trial numbers (xxx-6388) is down for the time being as of about 6:00PM; we will be working to get it back up but have no ETA as yet. Remember this is a test number and if you are using it you should be prepared to fall back to the normal numbers if it is unavailable.
Status: Back up as of about 11:00 PM. The equipment was unavailable as a result of an upgrade to a new release of the software for it; there was a routing problem with the new release and we had to give up on it temporarily and back off to the older one. We will be working with the new release at another time (although now that we know that it will take more than a few seconds just to upgrade the software, we will plan it as a normal outage).

Dec 20 AM
We will be doing some power work in our Manchester NOC starting at around 2:30AM Monday morning. This notice corrects earlier notices.

Dec 12/13 shell server
Starting at midnight the transition from the current shell server to the new shell server will begin. Both servers (mv.mv.com and iridium.mv.net) will be unavailable while this is happening. The new shell server should be up and available by 6AM Monday if not sooner. Those shell users who telnet to the shell server should telnet to the new server as shell.mv.net.
Status: The new shell server is in place as of about 5:00. Please let us know of any problems observed with it.

Dec 8 1999 DNH
At 13:00 (1PM) the Litchfield end of the T1 to our Dover POP ("dnh" providing dialup service on 740-9152) will moved to our Manchester NOC. We expect this to take between 15 and 30 minutes; during this period the "dnh" POP will not be reachable.
Status: The move is done, and took less than 5 minutes.

Dec 6 1999 MNH
At 13:00 (1PM) the Litchfield end of the T1 to our Manchester POP ("mnh" providing dialup service on 645-4986) will moved to our Manchester NOC. We expect this to take between 15 and 30 minutes; during this period the "mnh" POP will not be reachable.
Status: The move is done, and took about 25 minutes.

Nov 22 Nashua
A reoccurence of yesterdays link trouble started again this morning at 05:50, and lasted until 09:10. It was corrected, but soon after reoccured a second time lasting from 09:50 until 10:15.
Bell will be taking another look at their equipment. They don't yet understand the cause of the apparent lockups of said equipment. They plan to stress test the line if it another lockup should reoccur.
In the meantime most routing is set to bypass the suspect link, so the widespread problems that occured yesterday should not reoccur at that scale. The nashua site (886-7124) will be the most harshly affected if the link should go down again. The Brooks dial-in numbers should not be affected.

Nov 21
Routing issue prevents users from accessing non-MV sites. Sites available again at 11:15am.

Nov 9-10 BNH
A number of callers reported experiencing busy signals when calling the BNH numbers. (There is excess capacity in the BNH dialin numbers and busy signals should not happen except perhaps in extreme or emergency situations.) We tracked the problem down to one of our PM3 remote access servers (which handles modem and ISDN calls) that had decided to mark 30 of its modems as DOWN. Whenever a modem call happened to come into that PM3 and whenever the other modems in that PM3 were all in use, that PM3 would indicate a busy condition to the telephone company. That PM3 has been reset and no longer has any modems marked DOWN.

Nov 8 Litchfield
The town of Litchfield experienced another power outage at about 6:30PM, taking out the Litchfield site. Power was restored just before 8:00PM.

Nov 3 Litchfield (part two)
Around 11AM we experienced the loss of three T1s going into our Litchfield facility. One of these T1's was the same one that went down during the overnight storm, with the same impact; the problem was corrected at about 2PM and was reported to be caused by an unspecified error in the Bell Atlantic central office. The other two T1s (one to our DNH/Dover POP and the other to our MNH/Manchester POP) were taken down because of a human error at Bell Atlantic; these were restored at about 4:30PM.

Nov 3 Litchfield
A widespread power outage caused by the wind/rain storm took out the Litchfield site from about 12:05 through 2:30. Mainly affected were frame relay customers and users dialing into the MNH, LNH, and DNH POPs (but not the BNH numbers covering those areas).

October 21/22 - T1 moves
The scheduled move of two T1 circuits (see next two items) has been delayed again due Bell Atlantic engineering problems. We'll get this done yet.

Oct 21 1999 MNH
At 13:00 (1PM) the Litchfield end of the T1 to our Manchester POP ("mnh" providing dialup service on 645-4986) will moved to our Manchester NOC. We expect this to take between 15 and 30 minutes; during this period the "mnh" POP will not be reachable.
Note: At 13:15 this is still a moving target as Bell Atlantic chases down the new configuration.
Note: Postponed indefinitely due to Bell Atlantic engineering problems.

Oct 22 1999 DNH
At 13:00 (1PM) the Litchfield end of the T1 to our Dover POP ("dnh" providing dialup service on 740-9152) will moved to our Manchester NOC. We expect this to take between 15 and 30 minutes; during this period the "dnh" POP will not be reachable.
Note: Postponed indefinitely due to Bell Atlantic engineering problems.

Oct 1 1999 Nashua T1
The T1 to Nashua is down as of around 17:45. This is impacting one of our backbone connections as well as users dialed into the Nashua Bell Atlantic number (886-7124). Bell Atlantic is looking into the problem, no ETR as yet.
Status 23:30 : Bell Atlantic replaced an NIU in Manchester and the circuit is now back up and looking good.

September 29 Network sluggishness
Early this morning our link to BBNplanet was observed to be dropping packets. We took the link down in order to eliminate the lossy path, and packets were automatically re-routed via our other backbone connections. Bell Atlantic tested the link and (as often happens) the act of testing it cleared up whatever problem there was. The circuit was put back into place around 12:30.

On a more global scale, early this afternoon there was a major fiber cut in Ohio, affecting not only Ohio a lot of east-coast/west-coast Internet traffic. You will likely see a major slowdown in reaching some destinations until this is repaired.
Update Sep 30: Reported repaired as of around 2:50AM.

July 6 17:00
There are several storm-related outages as a cold front is passing through, dropping the temperature 18 degrees over an 8 minute period and releasing a lot of energy from the 95 degree heat. The most significant outage is that the power is out in Litchfield; this mainly affects Frame Relay customers and callers dialing into the old analog numbers at lnh(Litchfield), mnh(Manchester), and dnh(Dover). We expect this service to be restored when the power comes back in that town. The newer digital "bnh" numbers serving those areas are not affected.
PSNH is currently dealing with the huge number of outages throughout New Hampshire and is calling in additional crews from outside the state to assist in restoring power to the affected areas. They have given an estimate of tomorrow as to when power might be back on in the areas of our affected sites.
Status: July 7 9:05 The power, frame relay, and analog connections are back up.

July 6 17:00
The news server took a hit during the storm and should be back up by about 18:10.
( It was )

June 22 mv.mv.com shell server
The shell server has been down today due to disk drive errors. The staff is working on fixing it.
Status: 18:00 The shell server is back up. The server crashed this morning at around 8:00 with a hard drive failure. This drive contained mostly system and staff files, but it also contained the mail directory (where the incoming mailboxes are stored). We replaced the drive with a spare, but the data from the failed drive was not recoverable, so we had to restore data from a backup tape. We restored from a full backup that was made over the weekend and then from an incremental backup that was made Monday (yesterday) morning. The only data that should have been lost was any data that was changed after the Monday morning backup.

June 5 nameservers
We had some problems with the ns1.mv.net and ns2.mv.net nameservers today; ns2 crashed twice and ns1 several times more than that. We tracked it down to a bug in the nameserver code for which we also found a small patch; this patch has been applied and we expect that it will fix the problem. We don't know why the problem hasn't been triggered before now; we've been running this particular version of the code for about two months now. But bugs can be like that.. sometimes they won't show up for a long time and then suddenly rise to bite you.

June 4 mv.mv.com shell server
The shell server will be taken down at 3pm today to replace a disk drive that is showing errors. Should be back within a couple of hours.
Status: Back up as of about 4:55.

May 20 1999, Nashua
The Nashua move twice scheduled previously is now finally on for May 20 at 6AM. See the News section for details.
Status May 20: The move is complete. Site went down at 6:30AM and was back up again at about 8:25AM

May 11 1999: 800 number
Our support 800 number (1-800-MVC-NETS) had a breakdown on Tuesday May 11; the carrier somehow switched it so that it is ringing at some other business. We are working on getting this fixed and we have an ETA of before noon May 12.
Status May 12 11:42 : Number is now working correctly.

May 3 1999: BNH
Several remote access servers (these are the devices that handle your calls) went down at about 12:30AM when one of our UPS units gave out. These devices were moved to a different power source and were back up by 1:30AM. Hopefully we'll know more about the power problem soon..

April 30 1999: Frame Relay
As of about 1:20 AM Bell Atlantic experienced a Frame Relay switch failure, categorized as a major switch outage. ALL frame relay customers are affected; Bell Atlantic is working on the problem.
Status 6:00AM: The problem has been narrowed down to a couple of cards in Bell Atlantic's frame relay switch; cards are being sent up via courier and BA does not expect the switch to be back up until at least 9AM.
Status 10:30am: Frame Relay Outage Bell Atlantic is still working on the outage, the contact I spoke with said that technicians were working on it and hoped to have it up soon. All frame customer contacts have been notified about the outage, if you are a frame customer and did not receive a message from us, please contact support to confirm your contact information.
Status 10:34am: Frame Relay Backup Bell repaired the equipment, it all appears to be back up.

Nashua link April 10-11
Starting late afternoon Saturday we began experiencing problems (large numbers of data errors) on our link to Nashua. This problem affected our connection to the Internet through Nashua, and also would impact users dialing into the Bell Atlantic numbers in Nashua (but NOT the BNH Nashua number). We opened a trouble ticket with Bell Atlantic and they took it for testing on Sunday. While they reported finding no problems, they also reseated a line card; after they were through with it the problems cleared up. (Reseating the card could have had an effect, however as we have experienced many times in the past, sometimes the act of them running tests on a circuit clears whatever was causing problems.) The line was back up and running cleanly by about 19:35 April 11.

April 10 shell server
The shell server suffered an outage for about 90 minutes this evening; this was traced to the CPU cooling fan not spinning, resulting in data errors (yes, those CPU fans really are necessary). It being Saturday night, we simply opened up the server and restarted the fan by hand. Even though the fan was running at that point we left the server open and pointed a larger fan at it. We'll deal with it more appropriately during the week when we have more options. (Note that the shell server is scheduled to be replaced soon..).

Saturday April 10
On Saturday the PSNH power to our building will be turned off for about 5 hours. A generator will be powering the building during this period, however there will be a lack of power at the beginning as they wire the generator in, and again at the end as they remove the generator. Each of these short periods is expected to last about 15 minutes. The 15-minute period is about the limit of our UPS (battery backup) capacity, so we do anticipate an outage in each case.
The first outage should occur in the 7:00AM to 7:30 timeframe;
the second one when the work is done, expected to be somewhere between 10:30 and 12:00.
We will also take this opportunity to make some changes to the news server (replace a power supply, and move it physically); we expect to do this in conjunction with the first outage.
Status 14:55 All done. The deal here was that electrical workers were upgrading the power entrance to our building. They needed to cut power to the building in the morning and switch the building power over to a generator, and then at the end of the effort switch back from the generator to PSNH power. Each transition required an outage. The first outage happened at around 7:15AM and went reasonably well, although we did have to take down a couple of servers. The second outage happened at about 13:40; the contractors decided at that time that they needed around an hour of downtime, so we shut everything down while they worked. Power was restored after about an hour and we were back up at around 14:55.

April 1 1999
The shell server (mv.mv.com) crashed at about 10:10 this morning while we were searching a backup tape and was back up at about 10:30. Repeating the attempt to search the backup tape yielded the same result, the system was again back up at 10:55.

March 30 1999
At about 0:15 this morning the BNH access servers went down. An MV person is currently en route to investigate.
0:45 : The cause was a blown UPS (battery backup power) unit; the unit has been removed from service and the access servers are back in service (although temporarily without backup power).

Sun, 28 Mar 1999
(posted on 23 Mar)
On Sun Mar 28 between the hours of 2am and 4am there will be a short-term planned outage of some of our internal machines. Affected services expected include the mail servers, mv's main web server, and shell account server; possibly others. It is expected that none of these services will be unavailable for more than a fraction of the scheduled time period, but that each will be unavailable for a short time one by one. That's wishful thinking of course.

The items that are planned to be addressed during the outage include reinforcing the backup power systems on the main equipment, as well as completing a secondary internal local network to separate a specific portion of internal traffic load to a separate layer.

- Rob @ MV Staff

Mon, 22 Mar 1999
On Mon, at 8:53 AM a utility pole near our Salem site went down. This knocked out power as well as our T1 to that site. Later in the afternoon power was rerouted to the Salem site, but Bell Atlantic reports that they will not be able to repair the damage to their equipment at the pole until all utility power repairs there have been completed. The time estimate so far is tomorrow at the earliest.

- Rob @ MV Staff
March 24 10:25 Service has been restored.

March 10 1999
One of our Internet backbone links (via Destek) went out at about 5AM today due to a hardware failure upstream from us. They expect to have it corrected by the end of the day; until then, our other backbone links are automatically taking up the slack. You may experience different performance than usual when getting to some sites, or not...
Status March 11 18:30: The outage lasted longer than predicted, but this link is now back up and operating as before.

Manchester March 3 3pm
At around 3pm there was a power hit in Manchester and surrounding areas (you may have seen it on the news). Most of our equipment was unaffected as it is on UPS power equipment, however we have some UPS units out of service and we had three servers go down: news, mail, and web. mail and web were back within a few minutes, news was back in about a half an hour. Dialin modem servers, routers, authentication servers, nameservice, etc., all were protected and did not notice the outage.

Various servers Sunday Feb 28 1PM
On Sunday at 1PM we'll be doing some work on several of the servers (adding space, moving disks). Affected servers will be the shell server, the web server, the news server, and the secondary nameserver. We should be done by 3PM.
Status: Finished at about 4PM.

News Server Feb 25 1AM
The news server will be unavailable for a time starting at about 1AM, while we try an upgrade of the server software. This requires rebuilding some database indexes; it should be back by about 2AM.

Salem Feb 8 1999
The T1 to Salem went down as of about 17:15; Bell Atlantic has been notified.
23:25 The link is back up

News Feb 7 1999
14:45 the news server is being rebooted; should be back by about 15:25 or earlier.

Feb 3 1999 Concord
The T1 to our Concord POP is down as of about 12:38; Bell Atlantic is looking into the problem.
Note that this does not affect our bnh (Brooks) Concord area numbers; but only the 228-7181 hunt group.
Status 13:25 : Circuit came back up after BA ran tests on it.


Rates and services Access Policies Register Customer Pages User Information Back to top
About MV Our Staff Feedback Contacting us
Copyright © 1998 thru 2008 MV Communications, Inc.