MV Communications: 2004 outages/events

2004 outages/events

Dec 23/24 2004: everything
There was a major power outage in downtown Manchester starting at about 7:30 December 23 lasting much longer than our UPS backups are expected to hold. Power came back a few hours later but went out again as we were finishing bringing systems back up. Things are back up again as of about 1:40 AM Dec 24.

Dec 21 2004: web server
The web server had a crash this morning at about the same time a router was being rebooted (next item).

Dec 21 2004: router
A router needed to be rebooted this morning at about 9AM. Some routing issues may have been seen prior to this (specifically around 1AM and around 7AM).

Nov 5 and Nov 7, 2004: router
A core router was misbehaving and needed to be rebooted. The cause is believed to be memory-related; we've made some changes to address that problem (and rebooted to take better effect). Users may have seen sluggishness and unresponsiveness during the periods leading up to the reboots.

Aug 30, 2004: News server
The news server has been having intermittent troubles over the weekend and today. We've tracked it down (or so we believe) to a non-working fan on one of the CPUs in the server. We're swapping that out and will bring the server back up once that's done. ETA 13:00 or earlier.
Update: Looks like it will be longer than that- no ETA.
Status: Up as of about 17:15. We had put in a faster CPU as well as a new fan; the operating system didn't care for the faster CPU, and we had to put the original back in.

Friday August 27, 2004: web server
The web server copper.mv.net took a timeout starting around 23:40 tonight; it was up but not responding. It was nudged and back up at about 23:20.

Sunday August 22, 2004: servers
shell server: As previously announced, we will be putting the new UNIX shell server into place this weekend. The shell server will be unavailable while this process is ongoing. This will take quite some time while all user accounts and data are transferred, ending any time from Sunday evening to Monday morning.
Status: Complete as of 20:45
mail server: We are also going to take this opportunity to start an upgrade process on the mail server. This will involve replacing a SCSI card and enabling another CPU on the server. The mail server should be down for a half hour or less while this is being done.
Status: complete as of about 17:15
secondary dialup: Also, we will be moving some of the equipment that accepts calls on the secondary dialup numbers (not the main group). The secondary numbers will be unavailable for a brief period while this is happening.
Status: complete as of 15:30

Aug 18, 2004: various power-related
Shortly after noon we had a power failure in our server room, caused by somebody (the company president, who should know better), overloading one of the UPS (battery backup power) units. The main effect was that two servers were out of operation for about 45 minutes; however a couple of non-core routers were also down for a much shorter period.

Aug 7, 2004: shell server
The shell server crashed at about 5:10 due to unknown causes. Back up again at about 9:15.
Note: The shell server is being replaced (see the recent News Items and notes in the July newsletter).

July 1, 9:30, Paetec T3 move
Per the problem reported below (See June 28) our Paetec T3 circuit is scheduled to be moved to another router at 9:30 July 1. This will result in the T3 being down for a short while, with the load carried on our other backbone circuits. (This will probably not be noticable to most people.)
Status 9:50 Move completed.

June 28, Paetec link
Once again, routing issues on our Paetec link made it necessary for us to disable that circuit until the issues can be resolved. This began around 5:30 PM.
Status June 29 3pm:Paetec has tracked the problem down to a software bug on their router that we are connected to. The issue will be corrected by a future upgrade; in the meantime we will be scheduling a move of our circuit to a different router that does not have this bug (different model with different software set).

June 28, networking
We had several periods today (first around 6AM, next around 8:30AM, and finally near 8PM) where one of our routers was not performing correctly. Specifically, it was being flooded and unable to keep up. During the day, by looking at performance graphs, we were able to form a theory about the source of the flood. When it reoccured the final time (in the evening) we disconnected the suspect, which confirmed that it was indeed the cause.

June 9, routing
There are some routing problems in one of our backbones, outside of our network. We've turned that backbone off until the problem can be investigated further.
Status June 10, 12:30: The carrier resolved an issue overnight; this circuit is back up and carrying traffic again.

June 9, router
A central router crashed (unexpectedly) at around 11:15. It was back up after about 10 minutes.

May 27, MNH POP
At about 12:45 a power transformer serving the area near our MNH POP blew out. PSNH is on site; the repair is expected to take at least until tonight if not later. Affected are the small analog dialin group that we have there (645-4986), as well as some tenants in the building served by high speed access.
Status: May 28 02:30: Power is back

Saturday May 22, web server, scheduled
We'll be adding some disk space to the web server and rebuilding the root disk on the server as well. Most of this will be done "live" and won't necessitate much downtime, other than about three 5-10 minute outages to swap hardware in and out. The first reboot should occur around noontime, with the second and third occuring during the day as the operation progresses. Some of the disk space will be marked read-only during the course of the afternoon, in order to prevent changes to data that is being transferred to the new locations.
Status May 22: Now in progress
Status 15:45: Completed

May 12, routing
We're again (see the May 3 entry) seeing problems reaching a number of destinations via one of our backbone connections. We've shut down this circuit until it can be resolved. This may mean slower response times than normal reaching some sites, as the other backbone circuits have to pick up some extra load.
Status: The routing had stabilized when we tested it again shortly after midnihg (May 13) and so that circuit is back up again. We'll continue to watch it.

May 6, router
Today we had a reboot of a core router to upgrade its software to take advantage of some improvements in various areas. There were a couple of problems with the upgrade, requiring several additional reboots to correct. Each reboot took only a couple of minutes, however some DSL DHCP clients again had trouble re-obtaining leases. This should be an automatic function on the part of DHCP clients but we have seen that many clients fall short there. (DHCP server/client issues are, in fact, one of the areas motivating us to apply this upgrade.)

May 3, routing
Starting around noon we began seeing routing issues to some destinations from some sources (i.e., not everyone had a problem with a given destination). The problem is only occuring for some paths on one of our backbone connection. We had turned off that backbone for a while to route around the problems, however that circuit is back on as of about 16:50 so that the provider can troubleshoot it.
17:30 Traffic on that circuit has been shut down again.
May 4 11:00 Circuit is being used for outbound traffic only.
14:00 Routing problems are no longer occuring; the circuit is now back up fully.

April 20, 2004: router
We had to reboot our core router due to a memory issue. While we had it down we took the opportunity to add some memory to it. Downtime was less than 10 minutes.
Note: We briefly considered using this opportunity to schedule an impromptu software upgrade but thought better of it.

April 15, 2004: news server
The news server was down from about 4AM through 9:45AM due to an unknown cause. (perhaps it was overtaxed?)

April 2, 2004: news server
The news server is down as of about 7:30pm. It's expected back up by midnight.
Status: Up at about 11:40pm

March 24, 2004: backbone
Starting at 11AM we'll be doing some work on reconfiguring one of our backbone circuits (the Paetec T3). The circuit will be down for a short while during this process; we expect any impact to be minimal as the traffic will be distributed to our other backbone circuits.
Status: Finished by noon.

March 17, 2004: router
Our core router was unreachable for about 10 minutes, ending around 6pm tonight, due to a faulty cable connector.

Feb 28, 2004: news server
The usenet news server (news.mv.net) is down with a RAID/hardware failure. Coincidentally we have been on the verge of installing new news hardware/software (any week now). We will be spending some time today attempting to recover from the RAID failure and bring the existing system back up. If that can not be completed today, we'll simply work on bringing up the new system instead (a process that will take a day or two). More status will be here when we know.
Status:The news server is up as of about 15:15

Feb 15 2004: Router reboot
A core router had to be rebooted at around 4PM today to clear up a memory problem. The problem was affecting the router's behaviour. Any DSL customer having trouble after this reboot may need to release/renew your DHCP lease.

Jan 30 2004: frame relay
Our frame relay connectivity is down as of about 12:20. All frame customers are affected. Verizon has been notified and is looking into the trouble.
Status:Back up as of about 18:00 (Verizon repair)

Jan 16 2004: Mail server
The mail server is experiencing a temporary outage as of about 23:10. We expect to have it remedied within the hour.
Status: back up before 24:00. Cause of the outage was operator error upgrading a software package.
Note: the original posting incorrectly stated the outage start as 22:10, it was later corrected to the actual time (23:10).


Rates and services Access Policies Register Customer Pages User Information Back to top
About MV Our Staff Feedback Contacting us
Copyright © 1998 thru 2008 MV Communications, Inc.