MV Communications: 2001 outages/events
2001 outages/events
- November 15/16 Dialin access in Atkinson/Plaistow area
- Some customers are reporting problems calling our numbers
(both Verizon and Brooks) from this area. Verizon
is investigating.
Status: Problem appears to have cleared up by Saturday
morning (November 17).
- November 12 2001 -- DSL/C
- On Monday, November 12, ChoiceOne will be relocating the T3
between us and them to a different router. This will result in a
small amount of downtime for DSL/C customers starting around 10AM.
This is in preparation for adding some new capability as well as
preparing to change some of the routing parameters on DSL/C services.
Status Nov 12 10:15 : This switchover has begun.
Status Nov 12 10:56 : Switchover is complete.
- November 5, 2001 - MNH POP
- The T1 connecting our Manchester POP to our main facility is
down at this time. Verizon has been called for repair. This affects
our Manchester POP, which is the small dialin pool consisting of
V.34 (33.6Kbaud) analog dialup starting at 603-645-4986. This does
not affect our other Manchester V.90 dialin numbers or our 1-500
numbers servicing the Manchester area - only the older analog modem
pool.
Status: back up at 19:15
- November 5, 2001 -- mail server error with long names
- After the mail server upgrade this morning (see below), we did find one
problem: mailboxes with names of more than 16 characters did not work
because of a name length limit in one of the runtime libraries on the new
system. We've had to patch this in the past but believed that the newer OS
did not have this limitation. Once the problem was reported and identified
we made some code changes to allow the longer usernames; unfortunately some
mail to these mailboxes with long names did bounce in the interim.
- November 5 2001 early morning-- mail server
- During the wee hours Sunday morning November 5, 2001, we plan
to be replacing the current mail server. The new system will have
about ten times the horsepower of the current one, with more memory,
faster disks, more CPU. The replacement will require several hours
of downtime of the mail server while mailboxes and certain other files
are transferred to the new server. During this time you will not be
able to access your mailboxes -- this is unfortunate but necessary for
the transfer to occur. The operation will begin around 1AM.
Status 07:30 -- The new mail server is up and running
at this time. We will be watching it for any misbehaviour .
- November 2 2001 early AM
- In the wee hours of Friday morning (some time after 1AM) we will
be modifying some subnet parameters on systems in the network at our
Manchester facility. These changes may cause some brief disruptions
in the internal routing here while the procedure is ongoing.
Status: This operation was performed starting at 3AM
and ended up taking until about 5:45.
- Oct 29 08:00 2001 - news server
- The news server is out of disk space. The server is still
accessable, and reading of articles should not be affected. However,
new news articles are not currently being accepted due to the lack
of disk. We expect to have this corrected by early afternoon.
Status: Disk space freed.
- October 24, 2001, link to Destek
- At about 10:30 this morning our PVC to Destek (via high speed
ATM service) went down. This PVC provides one of our 4 connections
to Internet backbone services; the other connections are automatically
taking up the slack, but some slowdown may be seen as a result. Verizon
is working on the problem; we have no ETR as yet.
Status: Verizon corrected the problem, link was back up
as of about 18:15 .
- Oct 22 04:00 2001 - news server
- The news server is out of disk space. The server is still
accessable, and reading of articles should not be affected. However,
new news articles are not currently being accepted due to the lack
of disk. We expect to have this corrected by early afternoon.
Status: Disk space freed.
- October 19, 2001: SSH access
- SSH access to the shell server has been turned off temporarily
due to a reported security problem with the server code.
Status: SSH daemon upgraded. Security issue resolved.
- Oct 11 2001 09:40 - news server
- The news server went down this morning due to a UPS failure. It
was unplugged from the bad UPS and then rebooted.
- Sep 23 2001 18:00 - web server
- The web server was down for a short period of time while
some filesystems were reorganized, and a failed cpu fan was
also replaced. Downtime was about 40 minutes, and the operation
went sucessfully.
- Sep 18 2001 - Internet
- The Internet is experiencing worldwide problems as a new worm
is infecting and attacking certain Microsoft-based servers. This
is affecting not only those servers but the global Internet routing
infrastructure. This morning MV was experiencing some instability
on one of our core routers because of this, which we addressed by
making some configuration changes. There will
likely be more net-wide impact as the worm continues to spread.
- Sep 17 2001 - news server
- The news server was down for about 15 minutes this afternoon
while we moved it back from a temporary location to its home spot.
- Sep 7 2001 -- Nashua dialup
- At long last the Nashua dialup service is being folded into the 1-500
service. (This has been an ongoing project for almost a year now.)
Currently we have dialin service based in Nashua, and we have a T1 circuit
that connects our Manchester NOC to the dialin equipment in Nashua. We are
rolling the Nashua dialin service to the PRI trunk group supporting our
1-500 number in the Nashua area, and moving the equipment from Nashua to
Manchester to support it. There will be some downtime (probably from 1 to 2
hours) beginning at about noon while this equipment is being moved.
Afterwards, you should be able to continue using the same Nashua numbers as
before (either 886-7124 or 886-6688), or use the 1-500 number. There will
be some benefits to this new arrangement: more ports will be available, and
your calls will come right into our Manchester NOC rather than being served
off of a T1 to the Nashua POP.
Status Sep 7 14:00 : The equipment has been moved and
is installed, and we are waiting for Verizon to complete the order to
re-point the old numbers to the new trunk group. We expect this to
take 1-2 hours. In the meantime you can call the 1-500-699-6387
number if you are calling from a Verizon number in NH ,
or you can call the Nashua number that's already attached to the
new trunk group: 879-9009 .
Status Sep 7 16:50 : The original numbers are working
again although we've had evidence of occasional busy signals even though
the trunk groups are nowhere near full. We've asked Verizon to check
the configuration to make sure it is correct.
- News server Tue Aug 30 14:30 2001
- The news server is currently offline for a raid filesystem
rebuild. Downtime is expected to be several hours.
News services are expected to be restored by sometime later today.
Status 17:10 : The news server is back up; the news
spool was essentially wiped clean by the rebuild (we did preserve
some of the local hierarchies). Unfortunately this was necessary
since the server had been running in a RAID degraded mode since
August 16, slowing it down considerably, and attempts to repair
the array failed. We had to rebuild the array from scratch.
- News server Sat Aug 18 morning EDT 2001
- The new server crashed during an attempted rebuild of one
of the raid disks. (see Aug 16 below) News
services were down from 5:55 until 7:00.
- News server Thu Aug 16 05:00 EDT 2001
- There was a disk failure in the raid array of our news server
earlier this morning. This did not bring down the service, but was
something that needed to be immediately addressed. During an
attempted rebuild of the failed disk required to integrate it back
into the raid array, the server encounted a problem and rebooted.
This did bring down the news service, which is still down at this
time.
Status 08:18 :
News service has been restored. The disk rebuild was less than sucessful,
and the raid array is again running in Degraded mode. However, there
should not be any noticable performance loss and full news services are
currently functioning. Another rebuild will be attempted, possibly later
today.
- August 7 2001 - DSL/C DSL
- Starting at about 6pm on Aug 7, a pair of issues affected all
DSL/C customers for up to 2 hours. The first issue was triggered
by a customer doing something unusual that caused nearly 1Mbps of
traffic to be flooded to all users. This amount of traffic
resulted in congestion and high latency on all DSL/C customer
circuits. This sort of flooding is not supposed to occur and we
expect the telco engineers responsible for the DSL loop network to
address it. While we at MV were trying to identify the nature and
the source of the problem, we installed some agressive filters that
ended up causing the second issue: because of the filters, some
DHCP requests could not make it through to our DHCP server. (Our
filtering was nearly moot- the first issue occured outside our
boundary and our filtering could not address it. However, it did
allow us to track down the problem.) Both problems were addressed
within about two hours.
- July 25 2001 and onward -- router upgrades
- We will be doing work to upgrade software in one of our core
routers. There is a configuration incompatability with the new
OS that we need to work out: we will be doing this by loading the
new software at around 3AM, working with it, and putting the old
software back after an hour or two. This process requires at least
two reboots of the router. The reboots cause fairly minimal
disruption but we will be using this time of day to avoid impacting
most people even that much, except those of you who are night owls.
Note that this may happen multiple nights, but may not happen every
night.
- July 24 2001 - telnet access disabled
- Due to security problems found in telnet, we have disabled
telnet access to our shell server and to other systems. We have
no ETA on correction of these problems, but telnet access will
remain unavailable until there is a solution. We sincerely
apologize for any inconvenience.
Status July 26 01:00 : A patched telnetd was obtained
from the OS vendor and installed on the shell server. Telnet access
has been re-enabled. Use of ssh for access is still recommended ...
- July 23-24 2001 -- router crashes
- We have been experiencing frequent crashes of one of our
core routers. DSL users will have noticed short intervals
of Internet unavailability during the router reboots. All
users may have noticed varying response to the Internet as
routing falls over to other paths during the router downtime.
We are working on trying to identify and correct this problem.
There may be brief unnanounced maintenance periods while we
upgrade the OS and lay hands on the hardware.
Status July 25: Replacing a RAM stick in the router appears to
have eliminated the crashing problem. Although we have seen no further
crashes in the 30 hours or so since this replacement, the passage of more
time without crashes will be required to give more confidence in this. Note
that we will still be doing some experimentation with software upgrades per
another note above.
- Web server Friday July 20 6:30 AM - 7:00 AM
- A system upgrade is planned for our primary web server. This
is expected to involve about a half hour of downtime.
Status: took a little longer than expected; back up by 8:30am.
- News server Monday July 16, 22:30
- The news server is down for a short while in order to physically
relocated it back to it's correct location, after its temporary move
for the repair on July 13.
Status: back up by 11pm.
- News server Friday July 13 23:00 - 23:20
- Short downtime to disable an alarm caused by a faulty
air circulation fan.
- Frame Relay July 11
- As of about 10:30 pm, Verizon had a failure on one of their
frame relay switches. Several of our frame relay customers are
down as a result. We do not yet have an ETR.
Status July 12 01:45 Frame relay circuits that were down
came back up at this time.
- core router Tue, 19 Jun 2001 19:55
- One of our core routers locked up due to a reason still unclear.
This affected a portion of our network, including all of our DSL customers.
The outtage lasted about 15 minutes.
Other customers may have also been affected briefly due to a quick manual reload of
one of our other core routers which was necessary to the restoration
process.
- Web server Saturday June 9 1:00 AM - 2:00 AM
- A system upgrade was planned for our primary web server. This
was expected to involve about an hour of downtime.
Outcome: the downtime did last the full hour, but was unsucessful
in part because of a sudden unexpected failure of the machine's
power supply. Needless to say, the upgrade will be rescheduled for
a future date.
- June 2: ATM/Frame Relay Maintenance
- VAD (Verizon Advanced Data) will be doing maintenance on their
frame relay and ATM switches between midnight and 8AM June 2, 2001.
Each switch will be down for up to three hours. This will affect
frame relay and DSL/V customers.
Status Saturday June 2: We did not observe any outages
from this effort- will need to find out from VADI if the work was
cancelled.
- shell server June 1
- The shell server crashed at about 2:15pm today with a kernel
panic (uptime was 146 days, 23 minutes..). It was back within about
20 minutes.
- copper (web server) May 29 10:30PM
- The web server had to be rebooted at this time to correct a problem
with its ARP table. It was down for about 15 minutes.
- DHCP server May 20 9PM
- The DHCP server stopped functioning this evening and was
restarted after being reported by a monitoring facility.
- News server Friday April 13 2001 @ 1AM-2AM
- The news server will be going down for a scheduled hardware exchange.
The downtime is expected to be less than an hour.
- News server March 19 9AM
- The news server is currently not accepting incoming articles. We
hope to resolve this as soon as possible today.
Status: 1pm : Space was freed on the news disk and the
server is again accepting articles.
- News server March 3 1AM
- The news server will be down starting at 1AM for some experimentation
with hardware. Downtime should be 1 to 2 hours.
Status: Backup at about 2:30.
- Feb 16 2001, news server
- The news server was down from about 2pm through 2:30pm for
a rebuild of the message-id history index file.
- Feb 7 2001, Nashua T1
- At about 4:45pm the T1 to our Nashua POP began taking errors, causing
poor performance and sometimes unreachability for users dialing into the
Bell Atlantic Nashua numbers (886-7124 and others).
Status 6:15pm: Resetting some of the
communications equipment appears to have cleared up the problem.
- Jan 20 2001 -
- As of early morning our DS3 ATM circuit to Verizon is down,
affecting all DSL/V customers and also impacting one of our
backbone connections. Verizon NOC numbers are busy or have
backed up call queues, indicating some significant problem in
Verizon land. More info when we can get through.
Status 1:31 : Verizon informs us that they are doing a
planned upgrade of their ATM switch, which is expected to take
another 6 hours. Since we had no advance notice of this, we were unable
to inform you in advance.
Status: Circuit came back up some time before 8AM.
- Web server Jan 6 2001
- Our web server (www.mv.com and other pages located on this
same system) is sluggish this afternoon due to a high demand
situation. We expect it to clear after a while, so please be
patient.
Status: High load lasted through the end of the day
and appeared to have lessened to more reasonable levels overnight.
Note: the high load was due to publishing of new contest information
for the USFIRST competition, which was in high demand. See
www.usfirst.org for information
about that organization.
- Server outages Jan 6 2001 around noontime
- Both our mail and web servers went down for a while. While
fixing this shell access was also crippled. All three servers
were rebooted and are back up as of 1:15pm.
- Router upgrade Jan 5 2001 1AM
- At 1AM or later the morning of January 5, 2001, we will be
adding a card to our gw-bnh-8 router. The router should only be
down for a few minutes. While the router is down, one of our
backbone Internet connections will be offline, but most users should
see only a transient effect. DSL customers will be affected for
the duration of the upgrade.
Status: Successful upgrade with only a few minutes of downtime.
Copyright © 1998 thru 2008 MV Communications, Inc.