MV Communications: 1998 outages/events
1998 outages/events
- News server Dec 11 1998
- The news server blew a power supply at about 3pm; we temporarily
replaced the PS and the server was back up at about 4:15.
- Nashua Dec 1 1998
- There was a brief outage in Nashua this evening around 9:20PM
while we investigated a possible problem in the router (there was
one, and it was repaired). The repair only took about 5 minutes,
however the Bell Atlantic PRI number (886-7124) was inadvertantly
left down until until about 10:45PM.
- Tuesday November 17 02:00-03:00
- There is another planned outtage scheduled for 2:00 AM Tuesday morning
that will be used to continue some of the work that could not be
completed durring the Sunday morning downtime. Again, this planned
maintance is directly related to readying things for the BBNPlanet link.
However, this time the outtage should only include the gw2-55bridge
router in Manchester.
Status: Unfortunately same result as before: the router
is giving us a difficult time taking the upgrade.
- Sunday November 15
- In the wee hours of the morning we'll be doing some miscellaneous
maintenance: at about 1:30 AM the gw-nnh-1 router in Nashua router will
taken down for about 15 minutes to add some memory. And at about 3:00 AM
the gw2-55bridge router in Manchester will be also be taken down for at
least 30 minutes to add some memory. Also around this latter time the bnh
dialin access servers (serving the Brooks numbers) will be taken down for
some brief maintenance.
Status: All was accomplished except for the Manchester
router upgrade, where there was a glitch; this will need to be rescheduled.
- News server Nov 11-13
- Overnight November 11/12 and then again early November 13 the
news server was out for a period of an hour or more. After the first
outage one of its RAID disks was marked as failed and we underwent a
rebuild operation during the day November 12 (the server was up and
available during this process). We believe that there may have been
a power supply issue that was related to all of these incidents; some
changes were made as a result.
- Salem ...
- The link to Salem went down at about 3:15AM Thursday Nov 5; given
the outage last night Bell Atlantic was called immediately. At this point
they believe they see a problem internal to their system and are trying to
confirm it and narrow it down with some more tests.
4:55 : After some thorough investigation (checking multiple
test points) the Bell Atlantic technician believes that there are errors
communicating with the network interface unit (NIU) at the Salem location.
This person also believes the report last evening of a fiber problem was
erroneous, that whoever gave us that report was confusing it with a fiber
problem in Nashua. The link was released from testing and came back up.
Bell Atlantic will be dispatching someone to check out the NIU in Salem
during the day today; this means that there may be another outage period
while they replace it. There may be further instability in the line
until the repair is made.
- Wednesday Nov 4, Salem
- At about 21:30 the link to Salem went down and was showing a lot of
framing errors in one direction. We went on site to check our equipment at
the side that was showing errors but without touching it found that the line
had cleared by about 21:50. It's possible that Bell Atlantic threw the line
over to another trunk; we'll see what we can find out.
Update 22:20 : BA reports that the outage was due to a
fiber failure (of unspecified nature) that was repaired.
- Tuesday November 3 1998
- At about 12:53 both T1s between Litchfield and our Manchester office
went down. Bell Atlantic was called in for repair and quickly found the
cause: apparently while they were working on installing one of our new T1s
they disconnected these two. The outage caused a separation of our
Manchester office from the rest of the world; users dialing into the "bnh"
(Brooks) numbers could access our servers but not the rest of the net, users
dialing into other locations could access the Internet but not MV's servers.
Status 14:35 : Both T1s back in service.
- Friday October 30 1998
- At about 11AM the T1 between Litchfield and Nashua went down. The T1
was given to Bell Atlantic for testing. BA did not find a problem but, as
so often happens (to the point where some Bell Atlantic technicians
recognize this as a way to repair a down circuit), after they finished the
line came back up working again. Connectivity was restored at about 3:50PM.
During the outage, users dialing into the Nashua POP (that is, the Bell
Atlantic numbers 886-6688 and 886-7124) were not able to reach Internet
services or MV servers (note however that the BNH numbers serving Nashua
were not affected). This outage also cut us off from one of our links
to the Internet backbone; the full load switched over to the Sprintlink connection.
Note: See news for related information about
upcoming changes in our connectivity.
- Sunday October 11 1998 11AM
- At about 11AM we will be bringing down all of the servers in
order to do some brief work on the power. Downtime should last about
an hour.
Status: Everything was back up by noon.
- News and Mail servers, Oct 2 1998
- We had a heating system run amok here and had to shut down the news
server temporarily at about 1125AM; it was back up as of about
12:55 when the heat was again under control. The mail server was
down for about 15 minutes during this period as well.
- Nashua Oct 1 13:35
- There is a power outage in downtown Nashua affecting our Nashua
POP, and therefore one of our backbone links to the Internet.
The Nashua Bell Atlantic access numbers will ring busy during this
outage, however the BNH (brooks) access numbers are still available.
Status 15:05 : Back up.
- Sunday, Sep 27 1998
- Beginning at 11AM we will be doing some work in our server area;
the shell server will be brought down to swap its ethernet card and
to change its network wiring, and other servers and access equipment
may be brought down briefly so that we can do some physical changes
(wiring and location). We expect this to be over by 1PM.
Status: We finished most everything by about 1PM;
there was also a quick reboot of the shell server at about 1:45 in
order to re-attach an external SCSI device.
- Sep 19 1998
- The news server (news.mv.net) was rebooted at about 11:54AM
mainly to restart one of its daemons. It was back up within about
15 minutes.
- Sep 7 1998
- An early morning violent thunderstorm took out our link to
Nashua at about 5AM -- service was restored at about 11:00.
- Aug 18 1998
- A recall was issued on some recently-shipped modem cards used
in the Livingston pm3 remote access servers we use for our K56
lines. We had to pull and inspect each card to see which (if any)
needed to be returned. Since we wanted to get this done ASAP we
did this during the afternoon, and this resulted in hanging up
modems during the 20-minute (or so) period that it took, which
was from about 3:20 PM through 3:40 PM. FYI we identified 8
cards (78 modems) that will be returned and replaced-- the recall
relates to excessive retrains and some failures to connect in
some environments.
- Aug 15 1998
- The Nashua K56/ISDN number (886-7124) is ringing fast-busy --
our equipment shows the circuits up, so it's likely a number
translation issue. Bell Atlantic has been sic'd on it
Status Aug 18 15:00 : Fixed by BA
- Aug 11 1998 10pm
- Salem: Power was out in our Salem POP, probably due
to a fast-moving thunderstorm, from about 10:00PM Tuesday night
through 00:45 Wednesday morning.
- Aug 11 1998 14:30
- There was a power outage in our Manchester office at about
2:30PM which affected all of our servers. Most everything was back
up at about 3:00. The news server (which was also back up at this
time) had some data errors (likely because of the outage)
which caused some trouble in the early evening as well.
- Aug 2, 1998 13:00
- We will be doing some work in our computer room starting at 1PM-
the shell system (mv.mv.com) will be down for a short period -- it should
be back up by 2pm.
Status: This took somewhat longer than expected; but
was back up by about 3:15.
- Aug 1 15:00
- There was a power outage in our Manchester office today which
took down the servers there. Most servers are back up but we are
still working on getting the news and mail servers back online.
Systems were back up as of about 16:00 - but at that time we took
the news server offline to do another filesystem check there. It
was down for another 20 minutes or so.
- July 28 11:30
- We have temporarily disabled access to the mail server while a
problem with mail forwarding is being looked into.
Status 12:05 : server is available again
- July 25 5:30pm thru 9:30pm
- At about 5:30 pm we lost connectivity to a number of servers at
our Bridge Street NOC. This was traced to a loss of disk on one of the
servers -- a server that is, ironically, scheduled to be retired in
the immediate future. We rebuilt the disk and restored system-related
files from backups, and everything was back up by about 9:30pm.
- July 14 4pm
- The news server will be down for a few minutes between 4pm and
4:30pm while we move it a few feet.
- Salem July 12 1998
- The link to Salem is out at this time- Bell Atlantic has the circuit
for testing.
Status: Back up at about 7:30pm.
- June 27
- The news server will be taken down at about 11AM for some
hardware adjustments. Outage should be no more than an hour.
- Servers Jun 23 1998
- We lost a UPS (battery backup unit) at about 6:15 tonight
resulting in a couple of servers going down (including nameservice
and the shell machine). Systems were fully back by about 7:05.
- Nashua June 6 1998
-
There will be a scheduled power outage in our Nashua facility
on Saturday June 6 from approximately 7AM though 9AM. During
the outage our Nashua Bell Atlantic numbers will be down, as
will one of our Internet links.
Status: Over as of about 7:30; downtime was
about 10 minues.
- May 31 1998
- There are violent thunderstorms passing through New Hampshire this
evening, with more on the way. Affected MV POPs:
- Peterborough: down as of about 18:19 due to power outage.
Update 1:00AM back up.
- Nashua: down as of about 18:40 due to power outage in
Nashua. This impacted one of our Internet connections, but routing
switched to our Sprintlink connection automatically.
Update 19:25 : Back up at this time.
- All hubs, May 19
- There is an issue with one of the routers at our main
hub in Litchfield. This problem is being worked on. The symptoms
will be intermittent outages for users. For example, you may be
able to get to a site, but then on a reload of a web page, receive
a timeout error.
Status: Problem was located and solved at 2:15pm
Addendum: Repurcussions intermittendly affected Nashua dialins
somewhat after this time, but were also fixed later in the day.
- Brooks dialins May 7
- Our Brooks dialin lines (the K56flex/ISDN numbers serving
Manchester, Concord, Dover, and Peterborough) will be down from
about 9AM to 10AM as Brooks does some reconfiguration on the
PRI trunk groups.
Status: This was done about an hour late, but it
was done successfully.
- Salem April 29
- We are experiencing problems with the link to Salem and the link
has been down and up throughout the day. We are working to get this
resolved.
Status 16:30 : Up and stable again as of this time
- Salem April 28/29
-
The link to Salem went out at about 23:25 April 28. After checking
our equipment we asked Bell Atlantic to run a test on the link.
They tested it for about 15 minutes, and it ran clean. Once they
exited the test, the link came back up (at about 00:15 April 29).
(This is a phenomenon that the telco test people tell us happens
quite a bit. Sometimes the act of putting the line into and
taking it out of test mode clears whatever was wrong with it.)
- Power at MV office; April 24, 10:50am
- PSNH is shutting down all power at our Manchester office building, due
to potentially hazardous electrical problems in the basement. The staff has
been forced to leave the building at this time.
Our servers are all located in this building; they are on UPS's which
will keep them running for a while, but if the outage lasts too long,
service might be interrupted. We hope the interruption will be
as brief as possible (or non-existent), and that the staff can get
back to their phones and stations soon.
Status 13:30 : We're back in, and working on bringing
servers back up
14:00 everything is back.
- News server April 23
- The news server is down as of about 3pm; it had a glitch in one
of the component RAID disks and we are rebuilding the parity information
in the RAID array. We don't expect any lossage, but it will be down
for a few hours while it rebuilds.
Status: Back up at 18:00
- April 19, 1998
- News server:
The news server was offline from about 15:30 through 16:30 due to
an outage in the ethernet segment that it is on.
- April 15, 1998
- Litchfield:
Bell Atlantic is again going to try to do a conversion on the SLC in
our Litchfield location. This will happen at 3AM early Wednesday
morning April 15, and should take a couple of hours.
At the same time, we will take this opportunity to do some minor
equipment relocation in Litchfield, which may result in occasional
small interruptions.
Status 4AM: The conversion took about an hour and
looks to have been successful. The second part (equipment
relocation) turned out not to be possible at this time and so
was not done.
- April 12 noon- 15:30
- We're doing some work in our server room which may result in
some brief interruptions to one server or another this afternoon.
Duration of any interruptions should be just a few minutes.
- April 6 1998 10:45am
- The news server will be taken offline, so that a replacement
SCSI RAID controller card can be installed. Downtime should
be no more than an hour if all goes well.
- March 29-30 1998
- The news server crashed at about 19:00 Sunday night due
to an apparent SCSI failure. Several attempts to preserve the
news spool failed, and we finally elected to rebuild the filesystem,
which means that all articles on the news server were lost. Further,
the server is operating in a degraded mode until the SCSI problem
can be corrected. The system was back up at about 3AM March 30
in this mode.
- March 20 1998
- mv.mv.com (the shell system) will be taken down today at
around 11AM to replace a failing disk drive. Downtime will
be several hours as the data is copied to a new disk.
Status: Finished as of about 14:00
- March 18 1998
- We found a malfunctioning port in Litchfield which was preventing
IP callers from making a successful call on that port. Since the port
was fairly near the beginning of the hunt group, many callers were
affected. The modem and phone line were moved to a free port on another
terminal server.
- March 18 1998
- PSNH is replacing a power meter at our Manchester office
today at noon. This may result in a brief outage of one or
more of our servers.
Note: this was originally scheduled for March 17
but was pushed back a day.
- Mar 16 1998
- There was a power outage in Peterborough and surrounding areas
this afternoon; our Peterborough site was down from about 13:20
through 14:05.
- Mar 3 1998
- The link to the Manchester dialin modems is offline; we are on
our way to investigate.
Status: The CSU in Litchfield had been disturbed by
construction workers on premises and was in a loopback mode.
- Mar 2 1998
- The link to Keene is down as of about 9:15 PM, no ETR as yet.
Status: Up as of March 3 AM, problem was a wedged
CSU/DSU in Keene
- Feb 25 1998
- Litchfield: The SLC conversion originally scheduled
for Feb 13 has been rescheduled for Feb 25 at 3AM. See the original
note below for details, everything is the same except the date.
Status: The operation was not successful; BA stopped
at about 5AM without success, put the unit back the way it was, and
promised to reschedule after they could figure out what was going wrong.
After they left, 6 lines were found not working; we called and had
them busied out until they can be repaired.
- Feb 23 15:00
- pm-1 (the first terminal server in Litchfield) had to be replaced
due to periodic problems with required a site visit to reboot the beast
(there were two such problems over the weekend). A replacement was
swapped in, which we expect to stabilize the situation.
- Feb 17 13:44
- The news server is currently down; no ETR as yet.
Feb 18 00:36 Back up, after tracing down the
problem to two bad SCSI connectors. Gaudy details are in
the mv.info.outages newsgroup.
- Feb 13 1998
- Litchfield:
On the morning of Friday February 13, 1998, Bell Atlantic will be
doing some work on our phone connections in Litchfield. The phone
lines in Litchfield are connected to a SLC in our office, which is
connected to the Bell Atlantic switch in Merrimack by a direct fiber
link. BA will be converting the this SLC to an "Integrated" carrier
unit, meaning that the phone lines will bypass the analog part of the
CO switch and instead be connected to the trunk side. This may
improve connection rates in Litchfield (we'll see after the
conversion); but BA is doing this in order to relieve congestion in
the Merrimack switch.
The work is scheduled to start at 3AM Friday morning, and they
expect to be done by around 5AM.
Update: This was cancelled by BA and will be rescheduled
soon.
- Litchfield Feb 7 - 8
- As part of our equipment move within the building, the remaining modems
are being relocated this weekend. We will make every attempt to do this as
gently as possible, e.g. by busying out groups of lines before moving the
modems attached to them. However, this is a lengthy process and some
glitches are inevitable.
- Feb 7 1998
- There were two power outages in Litchfield this morning
due to some tinkering by construction people. They said that it
was accidental-- once at about 8AM and again at about 11AM.
- Jan 24 14:xx
- The link to Dover is down after power outages in both
Litchfield and Dover.
Jan 25 12:15 AM: Dover is back up following on-site
visits by MV and Bell Atlantic. (BA was swamped with outages
over the past day due to the ice storm; we appreciate their extra
effort and long hours.)
- Jan 24 14:00
- The shell server mv.mv.com is down to replace a disk and add
disk capacity (see also the note about this morning's power outage
below).
- Jan 24 1998
- After last night's ice storm Litchfield was without power from early
this morning through about 2PM, which meant that we were mostly down. We
took this, um, opportunity to perform a needed disk replacement, as well as
addition of disk space, on the shell server (mv.mv.com). We expect this
operation to be completed by around 4PM.
- Jan 21 1998
- The Nashua link will be moved back to a replacement cisco router during
the early morning hours overnight Jan 20/21, no earlier than 2AM.
This operation will bring
down boh the Sprintlink and the Nashua connections to Litchfield while
the equipment is being moved. Duration should be around 15 minutes.
2:15 AM: Move succesfully completed, Nashua link is
back up with full routing at this point.
- Jan 20 1998
- Dover: The access servers in Dover were offline from
about 15:45 through 17:00. The cause was, regrettably, pilot error
on our part while we were working on another issue. We apologize
for this downtime.
- Litchfield Jan 19, 1998
- Starting at about midnight Sunday night/Monday morning, we will be
doing some rearrangement of the first 30 modems in Litchfield. We will make
all efforts to keep the disruption to a minimum.
Note: this was originally scheduled for Sunday morning but
was pushed back due to the events in Salem.
- Jan 18 1998
- There is a power outage affecting areas in Salem in Windham;
our Salem POP is down as a result as of about 12:05AM.
Granite State Electric reports that crews are working on the
problem but have no ETR.
Update 2:45 AM: Power was restored at about 1:00AM;
however, the pm-snh-1 terminal server did not come up. On arrival
it was found that this terminal server was nonfunctional. We are
working on getting it restored or replaced, hopefully before 5:00.
4:10 AM: pm-snh-1 was replaced with a spare and is back
up.
- Nashua Jan 15 1998
- The T1 between Litchfield and Nashua is out as of about 5:30 AM.
Bell Atlantic has the line for testing.
Update: The problem appears to have been caused by
equipment failure at MV, likely a router port. We've restored
connectivity to Nashua by moving the link to another router; however
this affects the way data is routed between our multiple upstream
connections. While incoming data will still be split between
the two providers using optimum paths, most outgoing data will
be sent via Sprintlink (except for Nashua users, where outgoing
data will be sent via the Destek connection). Because of the
upcoming storm, this may take until some time next week to
resolve.
Update Jan 16: The link to Nashua has been showing a lot
of CRC errors on the Nashua end since being brought back up on the
new router, resulting in suboptimal performance. As always, watch
the mv.info.outages newsgroup for more info.
Jan 20: The link has been running clean since about 12:30
today. We will be moving it back to the cisco router overnight and
hope to have it back in its fully functional condition at that point.
Jan 21 2:15AM: Link is back up and fully operational.
Copyright © 1998 thru 2008 MV Communications, Inc.