During our migration from EC2 to dedicated hardware we took the opportunity to improve our monitoring services using a combination of twill, Monit and Pingdom. If operations such as checkouts, sign ups, log ins, forks or commits fail or slow down, automated alerts are triggered and our operations team will be notified via email, SMS and a phone call from Google Voice. This is in addition to the 24×7 monitoring, operations and alerting our hosting provider Contegix provides as part of their Beyond Managed hosting service.
Over the last week we’ve taken things even further and have integrated our tests and monitoring into a real-time server status site: status.bitbucket.org.
Moving forward, if a service check fails, status.bitbucket.org will automatically update with an alert banner indicating the problem:
When things are resolved, you’ll see a screen letting you know “things are forking awesome!” with an update on actions taken by the Bitbucket team:
To make following our updates even easier, we’ve setup an RSS feed as well.