Outage incident and our new monitoring setup

By on September 16, 2010

Today around 18:00 GMT two of our front end servers ran into the limit of our Apache’s MaxClient configuration. After receiving Pingdom alerts it took us 10 minutes to find the problem, change the setting, and reload Apache. During that time you may have noticed poor performance and timeouts, for that we apologize.

Our analysis points to some legacy HTTP load-balancing code left over from when we ran Bitbucket on EC2. We’re implementing a fix and will deploy the fix production soon.

Since switching off EC2, we’ve been working hard to improve our monitoring, which will help in times like this. For instance, we’re testing Monit, which could have automatically detected this problem and bounced Apache. We’re also working to expand the live functional, including Mercurial operations such as checkouts, using Twill and Kong.

For those of you interested in our hardware setup, the new front end machines each have 16 cores with 32GB of RAM. Since migrating from EC2 to Contegix we’ve rarely seen the load over 2 (~12% load) whereas we were at or beyond 100% load on EC2.

  • http://twitter.com/Twirrim Twirrim

    Zabbix will give you similar functionality too, if you want to mix in resource monitoring and graphing too.

  • http://www.paulgraydon.co.uk Twirrim

    Zabbix will give you similar functionality too, if you want to mix in resource monitoring and graphing too.

  • http://twitter.com/Twirrim Twirrim

    Zabbix will give you similar functionality too, if you want to mix in resource monitoring and graphing too.

  • Wez Furlong

    You may want to look at http://circonus.com/

  • Wez Furlong

    You may want to look at http://circonus.com/

  • Wez Furlong

    You may want to look at http://circonus.com/

  • http://twitter.com/iElectric Domen Kožar

    Guys, you should seriously switch to nginx, apache is a fork bomb.

  • http://twitter.com/iElectric Domen Kožar

    Guys, you should seriously switch to nginx, apache is a fork bomb.

  • http://twitter.com/iElectric Domen Kožar

    Guys, you should seriously switch to nginx, apache is a fork bomb.

  • http://www.nbaera.com nba shoes

    I almost passed your website up in Google but now I’m glad I stopped by and got to read through it. I’m definitely more informed now. I’ll be recommending your site to some friends of mine. They’ll get a kick out of what I just read too.

  • Shrawan Patel

    This post excellently highlights what the author is trying to communicate. Nonetheless, the article has been framed excellently well and all credits to the author. For more information on how to load balance your web servers, please visit ..nhttp://serverloadbalancing.biz/wordpressbiz/, nhttp://serverloadbalancing.info/wordpressinfo/