<html>
<body>
Dear Happy M5 Hosting Customer,<br><br>
<x-tab> </x-tab>Today I am
writing to explain some recent interruptions in our service. Every
growing business has it's challenges. This month, we had a few challenges
which affected the service you trust us to provide. <br>
<x-tab> </x-tab>When I
started M5 Hosting, one of the first things I decided was that that
communication, honesty and integrity would be core values to how we do
business. I know many businesses brag about these very same
principals. I myself have been a customer of other hosting companies in
the past. I was almost always disappointed by them in terms of those
three things. Communication, Honesty and Integrity.<br>
<x-tab> </x-tab>In keeping
with those core values, I am writing to you today to honestly communicate
a few challenges we have had this month, how they may have affected us
all, and what is being done to address them.<br><br>
<b>1/10/06 - after 4:30pm PST - Intermittent high packet loss and lost
TCP sessions to all customers<br>
</b><x-tab> </x-tab>
Generally, what any of our customers do with their server is entirely up
to them, except when it affects other customers or is illegal. In this
case, what a relatively new customer was doing was both illegal and
disruptive. It took some time to diagnose what was going on. It turned
out to be a resource exhaustion on the firewall. Specifically, the state
table had reached a configured hard limit. The firewall is capable of far
a far higher limit than the default value, so we raised the limit. It
wasn't until later that it was determined that the traffic was due to
illegal actions of one customer. This customer has been removed from the
network.<br>
<x-tab> </x-tab>What have
we done to mitigate the risk of this happening again ? We have
increased the capacity of the firewall to 5x greater than it was. We have
optimized the rules so that the current network load uses about 10% of
the system resources as it did before this incident. So, overall we can
handle about 50x more traffic before this will be a problem again.
Additionally, we have more clearly defined our anti-fraud policy. If we
had followed our policy, this new customer would not have been
accepted.<br><br>
<b>Evening of 1/28/06 - Network outage for most customers.<br>
</b><x-tab> </x-tab>They
say that human error accounts for 70% of all computer downtime. I'll bet
it's even higher than that. This outage was human error. While optimizing
the firewall to mitigate the risk described above, a simple typographical
error rendered the firewall impassable to almost all traffic.
Unfortunately this also locked us out of the firewall. Generally the 24hr
NOC staff at the Data Center facility are pretty responsive. As according
to Murphy's Law, right when we needed them them most, there were some
issues with their land phones which delayed recovery of the network. By
the time we got through to them on the phone, we were half way to the
data center (the data center is only 10 to 15min away). Rather than walk
them through the procedure to recover, we had them open the door to the
rack, and connect a crash cart to the firewall, in preparation of our
arrival. We arrived about 2 minutes later and remedied the problem
quickly.<br>
<x-tab> </x-tab>They were
my own fingers that caused this outage. I apologize for the mistake.<br>
<x-tab> </x-tab><br>
<b>Late evening on 1/28/06 and early afternoon on 1/29/06 - Shared
Hosting server outage<br>
</b><x-tab> </x-tab>The
server named "Witt" suffered a very rare kernel panic last
night and again this afternoon. The system came back up without issue
once it was rebooted. However, the fact that it has happened twice in
18hrs is not a good thing.<br>
<x-tab> </x-tab>What are
we doing to resolve it? We have reviewed the log files on the system but
have found no useful information relating to these two incidents. We have
upgraded the kernel to the latest available from RedHat, to ensure that
all fixes and patches which may relate to this issue are applied. This
evening we will take the system down and physically inspect the hardware
to ensure that the fans are all working properly, loose wires are
covered, cards are seated properly, etc. At that time we will replace the
RAM entirely (since memory problems are hard to find). We will take
further action if required.<br><br>
<x-tab> </x-tab>I hope you
find value in this email. I hope that you appreciate the communication
and the honesty. It would be so much easier to not have to write this
email, and pretend none of it ever happened. But, that would make us like
your last provider. We want to remain your provider. We want you to brag
about us, not complain about us.<br>
<x-tab> </x-tab>Your
feedback on this email, the affects of the incidents described above, or
the steps we are taking to mitigate them are always welcome. I'd really
like to hear from you.<br><br>
Thanks you for your trust and your business !<br>
Mike<br>
<x-sigsep><p></x-sigsep>
************************************************************<br>
Michael J. McCafferty<br>
Principal, Security Engineer<br>
M5 Hosting<br>
<a href="http://www.m5hosting.com/" eudora="autourl">
http://www.m5hosting.com</a> <br><br>
You can have your own custom Dedicated Server up and running today !<br>
RedHat Enterprise, CentOS, Fedora, Debian, OpenBSD, FreeBSD, and
more<br>
************************************************************</body>
</html>